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Title of the Invention 
Fungal Promoters Active in the Presence of Glucose 

1, 

Cross-Reference to Related Applications 

This application is a continuation-in-part of U.S. Application No. 
5 07/496,155 filed March 19, 1990. 

Background of the Invention 
I. Methods for the Identification of Promoters 

Many systems have been used to isolate genes and their promoters 
located immediately upstream of the translation start site of a gene. The 

10 techniques can roughly be divided in two categories, namely (1) where the aim 
is to isolate genomic DNA fragments containing promoter activity randomly 
by so-called promoter probe vector systems and (2) where the aim is to isolate 
a gene perse from a genomic bank (library) and isolation of the corresponding 
promoter follows therefrom. 

15 In promoter probe vector systems, genomic DNA fragments are 

randomly cloned in front of the coding sequence of a reporter gene that is 
expressed only when the cloned fragment contains promoter activity (Neve, 
R.L. et al. 9 Nature 277:324-325 (1979)). Promoter probe vectors have been 
designed for cloning of promoters in E. coli (An, G. et al. , /. Bact. 740:400- 

20 407 (1979)) and other bacterial hosts (Band, L. et al., Gene 26:313-315 
(1983); Achen, M.G., Gene 45:45-49 (1986)), yeast (Goodey, A.R. etal.* 
MoL Gen. Genet. 204:505-511 (1986)) and mammalian cells (Pater, M.M. 
et al., J. MoL App. Gen. 2:363-371 (1984)). Because it is well known in the 
art that Trichoderma promoters fail to work in E. coli and yeast (e.g. Pentdla, 

25 M.E. et al.. Mot. Gen. Genet. 794:494-499 (1984)), these organisms cannot 
be used as hosts to isolate Trichoderma promoters. Due to the fact that, 



during the transformation of Trichoderma, the transforming DNA integrates 
into the fungal genome in varying copies in random locations, application of 
this method by using Trichoderma itself as a cloning host is also unlikely to 
succeed and would not be practical for efficient isolation of Trichoderma 
promoters with the desired properties. 

Known genes can be isolated from either a cDNA or chromosomal 
gene bank (library) using hybridization as a detection method. Such 
hybridization may be with a corresponding, homologous gene from another 
organism (e.g., Vanhanen et al., Curr. Genet. 75:181-186 (1989)) or with a 
probe designed on the basis of expected similarities in amino acid sequence. 
If amino acid sequence is available for the corresponding protein, an 
oligonucleotide can also be designed which can be used in hybridization for 
isolation of the gene. If the gene is cloned into an expression bank, the 
expression product of gene can be also detected from such expression bank by 
using specific antibodies or an activity test. 

Specific genes can be isolated by using complementation of mutations 
in E. coU or yeast (e.g., Keesey, J.K. etal., J. Bact. 752:954-958 (1982); 
Kaslow, D.C., J. Biol Chem. 265:12337-12341 (1990); Kronstad, J.W., Gene 
79:97-106 (1989)), or complementation of corresponding mutants of 
filamentous fungi for instance by using SIB selection (Akins et al, Mol Cell. 
Biol 5:2272-2278 (1985)). 

However, a major concern is how to isolate specific genes that have the 
desired promoter properties, for example genes which would be most highly 
expressed when glucose is present in the medium. There is no information 
available in literature to indicate which genes are the most highly expressed 
in an organism, and especially not from filamentous fungi. The 
phosphoglyceratekinase (PGK) promoter from the yeast Saccharomyces 
cerevisiae is considered to be a strong promoter for protein production. 
However, results obtained by the inventors have shown that the corresponding 
Trichoderma promoter is not suitable for such protein production. Thus, the 
identification of specific Trichoderma genes for their isolation in order to 
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obtain the best possible promoter for protein production in certain desired 
conditions is unknown and cannot be predicted. Consequently one cannot rely 
on any previous nucleotide or amino acid sequence information, nor 
complement any previously known mutations, in gene isolation for such 
5 purpose in Trickoderma. 

Differential hybridization has been used for cloning of genes expressed 
under certain conditions. The method relies on the screening of a bank 
separately with an induced and noninduced cDNA probe. By this method 
e.g., Trickoderma reesei genes strongly expressed during production of 

10 cellulolytic enzymes have been isolated (Teeri, T. etal., Bio/Technology 
7:696-699 (1983)). The differential hybridization methods used are based on 
the idea that the genes searched for are expressed in certain conditions (like 
cellulases on cellulose) but not in some other conditions (like cellulases on 
glucose) which enables picking up clones hybridizing with only one of the 

15 cDNA probes used. However, for isolation of the genes expressed strongly 
on glucose, this approach (expression on glucose and not on some other 
media) is not a suitable one, and might in fact result in not finding the most 
highly expressed genes. This is because when differentially screening a 
chromosomal bank, only induced genes are selected. Such induced genes are 

20 not necessarily the most strongly expressed genes. Thus, no method is known 
in the art which would permit the identification of promoters which function 
strongly in Trichoderma on glucose medium. 

Another option for obtaining a promoter with desired properties is to 
modify the already existing ones. This is based on the fact that the function 

25 of a promoter is dependent on the interplay of regulatory proteins which bind 
to specific, discrete nucleotide sequences in the promoter, termed motifs. 
Such interplay subsequently affects the general transcription machinery and 
regulates transcription efficiency. These proteins are positive regulators or 
negative regulators (repressors), and one protein can have a dual role 

30 depending on the context (Johnson, P.F. and McKnight, S.L. Annu. Rev. 
Biochem. 55:799-839 (1989)). However, even a basic understanding of the 



WO 94/04673 



PCI7FI93/00330 



-4- 

regions responsible for regulation of a promoter requires a considerable 
amount of experimental data, and data obtained from the corresponding 
promoter of another organism is usually not useful (see Vanhanen, S. et al.. 
Gene 706:129-133 (1991)), or at least not sufficient, to explain the function 
5 of a promoter originating from another organism. 

//. Translation Elongation Factors 

Translation Elongation Factors (TEFs) are universally conserved 
proteins that promote the GTP-dependent binding of an aminoacyl-tRNA to 
ribosomal A-site in protein synthesis. Especially conserved is the N-terminus 

10 of the protein containing the GTP binding domain. TEFs are known as very 
abundant proteins in cells comprising about 4-6% of total soluble proteins 
(Miyajima, I. etai., J. Biochem. 55:453-462 (1978); Thiele, D. etai, J. 
Biol. Chem. 260:3084-3089 (1985)). 

tef genes have been isolated from several organisms. In some of them 

15 they constitute a multigene family. Also a number of pseudogenes have been 
isolated from some organisms. The promoter of the human tef gens can direct 
transcription in vitro at least 2-fold more effectively than the adenovirus major 
late promoter, which indicates that the tef promoter is a strong promoter in 
mammalian expression systems (Uetsuki et al. , J. Biol. Chem. 264:5791-5798 

20 (1989)). Both the human and the A. thaliana tef I promoter (for translation 
elongation factor EF-la) has been used in an expression system with high 
efficiency of gene expression (Kim etai., Gene 97:217-223 (1990); Curie 
etai, Nucl. Acid Res. 79:1305-1310 (1991)). In both cases the full 
expression of the promoter was dependent on the presence of the intron in the 

25 5' noncoding region. 

tef is quite constitutively expressed, the major exception being its expression 
in aging and quiescent cells. It is not known to be regulated by the growth 
substrates of the host. 



///. Expression of Recombinant Proteins in Trichoderma 

The filamentous fungus Trichoderma reesei is an efficient producer of 
hydrolases, especially of different cellulose degrading enzymes. Due to its 
excellent capacity for protein secretion and developed methods for industrial 
cultivations, Trichoderma is a powerful host for production of heterologous, 
recombinant proteins in large scale. The efficient production of both 
homologous and heterologous proteins in fungi relies on fungal promoters. 
The promoter of the main cellulase gene of Trichoderma, cellobiohydrolase 1 
(cbhl), has been used for production of heterologous proteins in Trichoderma 
grown on media containing cellulose or its derivatives (Harkki et al., 
Bio/Technology 7:596-603 (1989); Saloheimo etal., Bio/Technology 9: 987-990 
(1991)). The cbhl promoter cannot be used when the Trichoderma are grown 
on glucose containing media due to glucose repression of cbhl promoter 
activity. This regulation occurs at the transcriptional level and thus glucose 
repression could be mediated through the promoter sequences. It is also 
known that cellulase genes cbhl, cbk2 t egll and egl2 are coexpressed in 
various growth conditions, thus it is presumable that same regulatory factors 
operate on fairly similar promoter sequences mediating similar functions. 
However, nothing is yet known of the mechanism of glucose repression at the 
promoter level in filamentous fungi. 

Glucose repression in the yeast Saccharomyces cerevisiae has been 
studied for many years. These studies have however failed, until recently, to 
identify binding sequences in promoters or regulatory proteins binding to 
promoters which would mediate glucose repression. The first ever published 
glucose repressor protein and the binding sequence in eukaryotic cells was 
published by Nehlin and Ronne (Nehlin, LO. and Ronne, H. EMBO 7. 
9:2891-2899 (1990)). This MIG1 protein seems to be responsible of one fifth 
of the glucose repression of GAL genes in Saccharomyces cerevisiae, other 
factors still being required to obtain full glucose repression effect (Nehlin, 
J.O. era/., EMBO J, 70:3373-3377 (1991)). 



Thus, it is desirable to be able to produce proteins in Trichoderma 
grown on glucose. Not only is the substrate glucose cheap and readily 
available, but also Trichoderma produces less protease activity when grown 
on glucose. Further, cellulase production is repressed when Trichoderma is 
grown on glucose, thus allowing for the easier purification of the desired 
product from the Trichoderma medium. Nevertheless, to date there has been 
no identification or characterization of any promoter that is highly functional 
in Trichoderma grown on glucose. In addition, no modifications of the 
normally glucose repressed promoter, the cbhl promoter, have been identified 
which would allow the use of this strong promoter for expression of 
heterologous genes in Trichoderma grown on glucose. 

Summary of the Invention 

This invention is first directed to the identification of the motif, the 
DNA element, that imparts glucose repression onto the Trichoderma cbhl 
promoter. 

The invention is further directed to a modified Trichoderma cbhl 
promoter, such modified promoter lacking such glucose repression element 
and such modified promoter being useful for the production of proteins, 
including cellulases, when the host is grown on glucose medium. 

The invention is further directed to a method for the isolation of genes 
that are highly expressed on glucose, especially from filamentous fungal hosts 
such as Trichoderma. 

The invention is further directed to five such previously undescribed 
genes and their promoters from Trichoderma reesei. 

The invention is further directed to specific cloning vectors for 
Trichoderma containing the above mentioned sequences. 

The invention is further directed to filamentous fungal strains 
transformed with said vectors, which strains thus are able to produce proteins 
such as cellulases on glucose. 
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The invention is further directed to a process for producing cellulases 
or other useful enzymes on glucose. 

Brief Description of the Drawings 

Figure 1 shows the plasmid pTHNl which carries the tefl promoter 
5 and 5' part of the coding region and shows the relevant features of the tefl 
gene and the sequenced areas. Figure 1A is the nucleotide sequence of the 
tefl promoter and coding sequence [TEF001; SEQ ID 1]. The promoter 
sequence stops at base number 1234. The methionine codon of the start site 
of translation is located at base numbers 1235-1237 and is underlined. The 
10 total number of bases shown is 3461. The DNA sequence composition is 
850A, 1044C, 860G, 697T, and 10 other. 

Figure 2 shows the plasmid pEA33 which carries the tefl promoter and 
the coding region with relevant features. 

Figure 3 shows the plasmid pTHN3 which carries the promoter and 
15 coding region of the clone cDNAl and shows the relevant features. Figure 
3A is the nucleotide sequence of the cDNAl promoter and coding sequence 
[SEQ ID 2]. The promoter sequence stops at base number 1157. The 
methionine codon of the start site of translation is located at base numbers 
1158-1160 as numbered in Figure 3A and is underlined. 
20 Figure 4 shows the plasmid pEAlO which carries the promoter and 

coding region of the clone cDNAlO and the relevant regions and sequenced 
areas. Diagonally hatched = insert; solid line = sequenced region (genomic 
DNA); squared criss-crossed = sequenced region (cDNA). Not all EcoRV 
and Ndel sites are shown. Figure 4A is the nucleotide sequence of the 
25 cDNAlO promoter and coding sequence [CDNA10SEQ; SEQ ID 3]. The 
promoter sequence stops at base number 1522. The methionine codon of the 
start site of translation is located at base numbers 1523-1525 and is underlined . 
The total number of bases shown is 2868. The DNA sequence composition 
is 760A, 765C, 675G and 668T. 
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Figure 5 shows the plasmid pEA12 which carries the clone cDNA12 
and relevant features and sequenced areas. Diagonally hatched = insert; solid 
line = sequenced region (genomic DNA); squared criss-crossed = sequenced 
region (cDNA). 1 = unsequenced intron region. Note: Aval is not a unique 
5 site. Figure 5A is the nucleotide sequence of the cDNA12 promoter and 
coding sequence [A12DNA; SEQ ID 4], The promoter sequence stops at base 
number 1 101. The methionine codon of the start site of translation is located 
at base numbers 1102-1104 and is underlined. The total number of bases is 
2175. The DNA sequence composition is 569A, 602C, 480G, 519T and 5 
10 other. 

Figure 6 shows the plasmid pEA155 which carries the promoter and 
coding region of the clone cDNA15 and the relevant features and sequenced 
areas. Diagonally hatched = insert; solid line = sequenced region (genomic 
DNA); squared criss-crossed = sequenced region (cDNA). Not all Pstl and 
15 £coRl sites are shown. Figure 6A is the nucleotide sequence of the cDNA15 
promoter and coding sequence [SEQ ID 5]. The total number of bases is 
2737. The DNA composition is 647A, 695C, 742G, 649T and 4 other. 

Figure 7 shows plasmid pPLE3 which carries the egll cDNA. Just 
above the plasmid map is the sequence of the adaptor molecule [SEQ ID 25] 
20 that was constructed to remove the small Sacll and Asp718 fragment from the 
plasmid so as to construct an exact joint [SEQ ID 26» SEQ ID 27] between the 
cbhl promoter and the egll signal sequences [SEQ IDs 18 and 16]. Figure 
7A shows the 1588 bp sequence of the egll cDNA (369A, 527C, 418G and 
274T) [SEQ ID 16]. Figure 7B shows the sequence of the 745 bp cbhl 
25 terminator of pPLE131 (198A, 191C, I77G, and 179T) [SEQ ID 23]. 

Figure 8 shows construction of plasmid pEM-3A and SEQ ID 28. The 
"A" on the plasmid maps denotes the EGI tail sequence and the "B" denotes 
the EGI hinge sequence. 

Figure 9 shows the piasmid pTHNIOOB for expression of the EGlcore 
30 under the tefl promoter and SEQ ID 28. 
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Figure 10 shows production of EGIcore from the plasmid pTHNlOOB 
into the culture medium of the host strain QM9414 analyzed by EGI specific 
antibodies from a slot blot. Lane 1; pTHN100B-16b, 200 y\ glucose 
supernatant; lane 2: QM9414, 200 pi glucose supernatant; lane 3: TBS; lane 
5 4: QM9414, 200 p\ solka floe 1 :500 diluted supernatant; lane 5: QM9414, 200 
/il solka floe 1:5,000 diluted supernatant; lane 6: QM9414, 200 pi solka floe 
1:10,000 diluted supernatant; lane 7: pTHN100B-16b, 200 pi glucose 1:5 
diluted supernatant; lane 8: QM9414, 200 pi glucose 1:5 diluted supernatant; 
lane 9: 200 ng EGI protein; lane 10: 100 ng EGI protein; lane 11: 50 ng EGI 

10 protein; and lane 12: 25 ng EGI protein. 

Figure 11 shows Western blotting with EGI specific antibodies of 
culture medium of the strain pTHN100B-16c grown in whey-spent grain or 
glucose medium, and of EGIcore purified from the glucose medium. Lane 1 : 
1 pTNHlOOB- 16c, 10 pi whey spent grain supernatant; lane 2: pTNH100B-16c, 

15 5 pi whey spent grain supernatant; lanes 3-5: EGIcore purified from 
pTHN100B-16c glucose fermentation; lane 6: pTHN100B-16c, 15 pi glucose 
fermenter supernatant, concentrated lOOx; lane 7: pTHN100B-16c, 7.5 pi 
glucose fermenter supernatant, concentrated lOOx; and lane 8: low molecular 
weight markers at 94kDa t 67 kDa, 43 kDa, 30 kDa and 20.1 kDa (bands 1-5 

20 starting from lane 8, top of gel). 

Figure 12 shows Western blotting of culture medium of the strain 
pTHN100B-I6c grown on glucose medium. Lane 1: EGI protein, about 540 
ng; lane 2, EGI protein, about 220 ng; lane 3, EGI protein, about 110 ng; 
lane 4: pTHN100B-16c, 30 pi glucose fermenter supernatant; lane 5: 

25 pTHN100B-16c, 30 pi glucose fermenter supernatant, concentrated 4.2x; lane 
6: low molecular weight markers at 94kDa, 67 kDa, 43 kDa, 30 kDa and 20. 1 
kDa (bands 1-5 starting from lane 6, top of gel). 

Figure 13 diagrams the elements of the plasmid pML016. Figure 13A 
is the sequence of the cbhl promoter of plasmid pML016 [SEQ ID18]. Figure 

30 13B is the sequence of the T. reesei cbhl terminator on plasmid pML016 and 
plasmids derived from it [SEQ ID24]. 
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Figure 14 shows the expression of /3-galactosidase on glucose medium 
in pML016del5(ll)-transformants of Trichoderma reesei QM 9414 (A2-F5). 
Al: QM 9414 host strain; CI and El: QM 9414 transformant in which one 
copy of /3-galactosidase expression cassette with intact cbhl promoter has 
5 replaced the cbhl locus; Bl, Dl and Fl: empty wells. 

Figure 15 shows the restriction map of the plasmid pMLOl6del5(l 1), 
which carries the shortened form of the cbhl promoter fused to the lacZ gene 
and the cbhl terminator. Figure 15A is the sequence of the truncated cbhl 
promoter [(pMLOl6deI5(il)); SEQ ID 19]. The polylinker is underlined. The 
10 arrow denotes the deletion site. 

Figure 16 shows the restriction map of the plasmid pML017, which 
carries the shortened form of the cbhl promoter fused to the cbhl 
chromosomal gene. The restriction sites marked with a superscripted cross 
M+ " are not single sites. There are two additional EcoRl sites in the cbhl gene 
15 that are not shown. Figure 16A shows the sequence of the Ksp\-Xma\ 
fragment (the underlined portion) that contains the chromosomal cbhl gene 
[SEQ ID 17]. 

Figure 17 shows the expression of CBHI on glucose medium in 
pML017 transformants of Trichoderma reesei QM 9414. A collection of 
20 single spore cultures (number and a letter-code) and different control samples 
are shown. 

Figure 18 shows specific mutations of mig-Iike sequences (M) in cbhl 
promoters of pMI-24, pMI-25, pMI-26, pMI-27 and pMI-28. The promoters 
shown here were fused to lacZ gene and cbhl terminator as described for 

25 pML016 (see Figure 13) or pMLOl6delO(2) (see Figure 19). *: sequence 
alteration made in cbhl promoter in different combinations. At position 
-1505-1500 the genomic sequence is 5'-CTGGGG and the altered sequence is 
5-TCTAAA. At position -1001-996 the genomic sequence is 5'-CTCGGG 
and the altered sequence is 5'-TCTAAA. At position -720-715 the genomic 

30 sequence is 5'-GTGGGG and the altered sequence is 5-TCTAGA. 
pMLO16de!0(2) was used as a starting vector for pMI-25, pMI-26, pMI-27 
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and pMI-28, pML016 for pMI-24. v = the polylinker. Figure 18A is the 
sequence of the altered cbhl promoter of pMI-24 (PMI27PROM) 
([SEQ ID20]). The total number of bases is 1776. The sequence composition 
is 487A, 399C, 434G, and 456T. The polylinker is underlined and the 
5 sequence alteration is boxed. Figure 18B is the sequence of the altered cbhl 
promoter of pMI-27 ([SEQ ID21]). The polylinker is underlined, the arrow 
denotes the deletion point and the sequence alterations are boxed. Figure 18C 
is the sequence of the altered cbhl promoter of pMI-28 (PMI28PROM) 
([SEQ ID22]). The polylinker is underlined, the arrow denotes the deletion 
10 point and the sequence alterations are boxed. The total number of bases is 
1776. The sequence composition if 490A, 399C, 430G and 457T. 

Figure 19 shows the restriction map of the plasmid pMLO16del0(2), 
which carries the shortened form of the cbhl promoter fused to lacZ gene and 
the cbhl terminator. 

15 Figure 20 shows the expression of /3-galactosidase on indicated medium 

in Trichoderma reesei QM9414 transformed with pMLO16deJ0(2), pMI-25, 
pMI-27, pMI-28, pML016 and pMI-24. 

Detailed Description of the Preferred Embodiments 

I. Identification of Fungal Genes that Express on Glucose Medium 

20 In the following description, reference will be made to various 

methodologies known to those of skill in the art of molecular genetics and 
biology. Publications and other materials setting forth such known 
methodologies to which reference is made are incorporated herein by reference 
in their entireties as though set forth in full. 

25 General principles of the biochemistry and molecular biology of the 

filamentous fungi are set forth, for example, in Finkelstein, D.B. et aL, eds., 
Biotechnology of Filamentous Fungi: Technology and Products, Butterworth- 
Heinemann, publishers, Stoneham, MA (1992) and Bennett, J.W. etal.,More 
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Gene Manipulations in Fungi , Academic Press - Harcourt Brace Jovanovich, 
publishers, San Diego CA (1991). 

To be able to develop versatile systems for protein production from 
Trichoderma, especially when Trichoderma are grown on glucose, a method 
5 has been developed for the isolation of previously unknown Trichoderma genes 
which are highly expressed on glucose, and their promoters. The method of 
the invention requires the use of only one cDNA population of probes. 

It is to be understood that the method of the invention would be useful 
for the identification of promoter sequences that are active under any desired 

10 environmental condition to which a cell could be exposed, and not just to the 
exemplified isolation of promoters that are capable of expression in glucose 
medium. By "environmental condition" is meant the presence of a physical 
or chemical agent, such agent being present in the cellular environment, either 
extracellularly or intracellularly. Physical agent would include, for example, 

15 certain growth temperatures, especially a high or low temperature. Chemical 
agents would include any compound or mixtures including carbon growth 
substrates, drugs, atmospheric gases, etc. 

According to the method of the invention, the organism is first grown 
under the desired growth condition, such as the use of glucose as a carbon 

20 source. Total mRNA is then extracted from the organism and preferably 
purified through at least a polyA + enrichment of the mRNA from the total 
RNA population. A cDNA bank is made from this total mRNA population 
using reverse transcriptase and the cDNA population cloned into any 
appropriate vector, such as the commercially available lambda-ZAP vector 

25 system (Stratagene). When using the lambda-ZAP vector system, or any 
lambda vector system, the cDNA is packaged such that it is suitable for 
infection of any E. coli strain susceptable to lambda bacteriophage infection. 

The cDNA bank is transferred by standard colony hybridization 
techniques onto nitrocellulose filters for screening. The bank is plated and 

30 plaque lifts are taken onto nitrocellulose. The bank is screened with a 
population of labelled cDNAs that had been synthesized against the same RNA 
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poptilation from which the cloned cDNA bank was constructed, using stringent 
hybridization conditions. It should be noted that the genes are not expressed 
in any way during this selection process. This results in clones hybridizing 
with varying intensity and the ones showing the strongest signals are picked. 
Genes that are most strongly expressed in the original population comprise the 
majority of the total mRNA pool and thus give a strong signal in this 
selection. 

The inserts in clones with the strongest signals are sequenced from the 
3 'end of the insert using any standard DNA sequencing technique as known 
in the art. This provides a first identification of each clone and allows the 
exclusion of identical clones. The frequency with which each desired clone 
is represented in the cDNA lambda-bank is determined by hybridizing the 
bank against a clone-specific PCR probe. The desired clones are those which, 
in addition to having the strongest signals as above, are also represented at the 
highest frequencies in the cDNA bank, since this implies that the abundancy 
of the mRNA in the population was relatively high and thus that the promoter 
for that gene was highly active under the growth conditions. Thus, the 
relevance of this approach and any clone identified therefrom can be double- 
checked: the intensity of the hybridization signal of a specific clone should 
correlate positively with the frequency with which that clone is found in the 
cDNA bank. The inserts of the clones selected in this manner, such inserts 
corresponding to the cDNA sequences, may be used as probes to isolate the 
corresponding genes and their promoters from a chromosomal bank, such as 
one cloned into lambda as above. 

The method of the invention is not limited to Trichoderma^ but would 
be using for cloning genes from any host, or from a specific tissue with such 
host, from which a cDNA bank may be constructed, including, prokaryote 
(bacterial) hosts, and any eukaryotic host plants, mammals, insects, yeast, and 
any cultured cell populations. 

For example, using the method of the invention, five genes that express 
relatively high levels of mRNA in Trichoderma reesei when such Trichoderma 
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are grown on glucose were identified. These genes were sequenced and 
identified as clone cDNA33, cDNAl, cDNAlO, cDNA12, and cDNA15. 
When used to screen a Trichoderma chromosomal lambda-bank, the 
corresponding genes and their promoters were identified. Such genes and 
5 promoters (or portions thereof) may then be subcloned into any desired vector, 
such as the pSP73 vector (Promega, Madison, WI, USA). 

According to the invention, the clones containing the genes and their 
promoters (or parts of them) highly expressed in Trichoderma grown on 
glucose are represented as follows: 



Plasmid 


Fieure 


cDNA 


Fieure 


SEOID No 


pTHNl 


1A 


cDNA33 


IB 


1 


pEA33 


2 


cDNA33 


IB 


1 


pTHN3 


3A 


cDNAl 


3B 


2 


pEAlO 


4A 


cDNAlO 


4B 


3 


pEA12 


5A 


cDNA12 


5B 


4 


pEA155 


6A 


CDNA15 


6B 


5 



One of the genes isolated according to the invention as being highly 
expressed when Trichoderma was grown on glucose has been identified as the 
one encoding Trichoderma translation elongation factor la {tefl). In addition, 
20 four other, new genes have been identified for the first time that are highly 
expressed on glucose in Trichoderma. 

These data show that the method used in this invention resulted in 
isolating five genes, one of which {tefl) is known to be efficiently expressed 
in other organisms. However, the tefl gene was not the most highly 
25 expressed of the five genes isolated from the Trichoderma cDNA bank by the 
method of the invention. 

Of the five genes isolated, only tefl shows a relevant degree of 
homology to any known protein sequences. All of the genes isolated are also 
expressed on other carbon sources and would not have been found with the 
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classical method of differential cloning. This shows the importance of the 
method used in this invention in isolation of the most suitable genes for a 
specific purpose, such as for isolation of strong promoters for expression on 
glucose containing medium. 

The promoter of any of these genes may be operably linked to a 
sequence heterologous to such promoter, and especially heterologous to the 
host Trichoderma, for expression of such gene from a Trichoderma host that 
is grown on glucose. Preferably, the coding sequence provides a secretion 
signal for secretion of the recombinant protein into the medium. 

Use of the promoters of the invention allow for the expression of genes 
from Trichoderma under conditions in which there are no cellulases and 
relatively few proteases. Thus, for the first time, recombinant genes can be 
highly expressed on Trichoderma using a glucose-based growth medium. 

The promoters of the invention, while being strongly expressed on 
glucose (that is, when the filamentous fungal host is grown on medium 
providing glucose as a carbon and energy source), are not repressed in the 
absence of glucose. In addition, they are active when the Trichoderma host 
is grown on carbon sources other than glucose. 

The glucose promoters of the invention, and those identified by the 
methods of the invention, can be used to produce enzymes native to 
Trichoderma itself, especially of those capable of hydrolysing different kinds 
of plant material. On glucose, the fungus does not naturally produce these 
enzymes and consequently one or more specific hydrolytic enzymes could be 
produced on glucose medium free from other plant materia) hydrolyzing 
enzymes. This would result in an enzyme preparate or enzyme mixtures for 
specific applications. 
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II. Modification of the Cellobiohvdrolase I Promoter 

This invention also describes a method for the modification of the 
cellobiohydrolase 1 promoter {cbkl) such that the activity of the promoter is 
retained but the promoter no longer is repressed when cells are grown on 
5 glucose-containing medium. Essentially, the DNA motif that imparted glucose 
repression has been identified and removed from this promoter, allowing 
production of desired proteins whose coding sequences are operably linked to 
the promoter in suitable hosts, such as Trichoderma. Such a modified cbhl 
promoter is termed a derepressed cbhl promoter. As above, when the 

10 recombinant organisms obtained from transformation with such constructs are 
cultivated on glucose containing medium, any protein, including a ceHulase 
may be produced without production of other plant material hydrolysing 
enzymes, especially of native cellulases. 

Isolated glucose promoters or derepressed cbhl promoter can be used 

15 for instance to produce separate individual cellulases in hosts grown on 
glucose without any simultaneous production of other hydrolases such as other 
cellulases, hemicellulases, xylanases etc. or to produce heterologous proteins 
in varying growth media. 

III. Preparation of Coding Sequences Operably Linked to the 
20 Promoter Sequences of the Invention 

The process for genetically engineering a coding sequence, for 
expression under a promoter of the invention, is facilitated through the 
isolation and partial sequencing of pure protein encoding an enzyme of interest 
or by the cloning of genetic sequences which are capable of encoding such 
25 protein with polymerase chain reaction technologies; and through the 
expression of such genetic sequences. As used herein, the term "genetic 
sequences" is intended to refer to a nucleic acid molecule (preferably DNA). 
Genetic sequences that are capable of encoding a protein are derived from a 
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variety of sources. These sources include genomic DNA, cDNA, synthetic 
DNA, and combinations thereof. The preferred source of genomic DNA is 
a fungal genomic bank. The preferred source of the cDNA is a cDNA bank 
prepared from fungal mRNA grown in conditions known to induce expression 
5 of the desired gene to produce mRNA or protein. However, since the genetic 
code is universal, a coding sequence from any host, including prokaryotic 
(bacterial) hosts, and any eukaryotic host plants, mammals, insects, yeasts, and 
any cultured cell populations would be expected to function (encode the 
desired protein). 

10 Genomic DNA may or may not include naturally occurring introns. 

• Moreover, such genomic DNA may be obtained in association with the 5' 
promoter region of the gene sequences and/or with the 3' transcriptional 
termination region. According to the invention however, the native promoter 
region would be replaced with a promoter of the invention. 

15 Such genomic DNA may also be obtained in association with the 

genetic sequences which encode the 5' non-translated region of the mRNA 
and/or with the genetic sequences which encode the 3' non-translated region. 
To the extent that a host cell can recognize the transcriptional and/or 
translation^ regulatory signals associated with the expression of the mRNA 

20 and protein, then the 5' and/or 3' non-transcribed regions of the native gene, 
and/or, the 5' and/or 3' non-translated regions of the mRNA may be retained 
and employed for transcriptional and transiational regulation. 

Genomic DNA can be extracted and purified from any host cell, 
especially a fungal host cell, which naturally expresses the desired protein by 

25 means well known in the art. A genomic DNA sequence may be shortened 
by means known in the art to isolate a desired gene from a chromosomal 
region that otherwise would contain more information than necessary for the 
utilization of this gene in the hosts of the invention. For example, restriction 
digestion may be utilized to cleave the full-length sequence at a desired 

30 location. Alternatively, or in addition, nucleases that cleave from the 3'-end 
of a DNA molecule may be used to digest a certain sequence to a shortened 
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form, the desired length then being identified and purified by gel 
electrophoresis and DNA sequencing. Such nucleases include, for example, 
Exonuclease III and BaB\. Other nucleases are well known in the art. 

For cloning into a vector, such suitable DNA preparations (either 
.genomic DNA or cDNA) are randomly sheared or enzymatically cleaved, 
respectively, and ligated into appropriate vectors to form a recombinant gene 
(either genomic or cDNA) bank. 

A DNA sequence encoding a desired protein or its functional 
derivatives may be inserted into a DNA vector in accordance with 
conventional techniques, including blunt-ending or staggered-ending termini 
for ligation, restriction enzyme digestion to provide appropriate termini, filling 
in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid 
undesirable joining, and ligation with appropriate ligases. Techniques for such 
manipulations are disclosed by Maniatis, T., (Maniatis, T. et al. t Molecular 
Cloning (A Laboratory Manual), Cold Spring Harbor Laboratory, second 
edition, 1988) and are well known in the art. 

Libraries containing sequences coding for the desired gene may be 
screened and the desired gene sequence identified by any means which 
specifically selects for a sequence coding for such gene or protein such as, for 
example, a) by hybridization with an appropriate nucleic acid probe(s) 
containing a sequence specific for the DNA of this protein, or b) by 
hybridization-selected translational analysis in which native mRNA which 
hybridizes to the clone in question is translated in vitro and the translation 
products are further characterized, or, c) if the cloned genetic sequences are 
themselves capable of expressing mRNA, by immunoprecipitation of a 
translated protein product produced by the host containing the clone. 

Oligonucleotide probes specific for a certain protein which can be used 
to identify clones to this protein can be designed from the knowledge of the 
amino acid sequence of the protein or from the knowledge of the nucleic acid 
sequence of the DNA encoding such protein or a related protein. 
Alternatively, antibodies may be raised against purified forms of the protein 
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and used to identify the presence of unique protein determinants in 
tiansformants that express the desired cloned protein. When an amino acid 
sequence is listed horizontally, unless otherwise stated, the amino terminus is 
intended to be on the left end and the carboxy terminus is intended to be at the 
right end. Similarly, unless otherwise stated or apparent from the context, a 
nucleic acid sequence is presented with the 5' end on the left. 

Because the genetic code is degenerate, more than one codon may be 
used to encode a particular amino acid. Peptide fragments may be analyzed 
to identify sequences of amino acids that may be encoded by oligonucleotides 
having the lowest degree of degeneracy. This is preferably accomplished by 
identifying sequences that contain amino acids which are encoded by only a 
single codon. 

Although occasionally an amino acid sequence may be encoded by only 
a single oligonucleotide sequence, frequently the amino acid sequence may be 
encoded by any of a set of similar oligonucleotides. Importantly, whereas all 
of the members of this set contain oligonucleotide sequences which are capable 
of encoding the same peptide fragment and, thus, potentially contain the same 
oligonucleotide sequence as the gene which encodes the peptide fragment, only 
one member of the set contains the nucleotide sequence that is identical to the 
exon coding sequence of the gene. Because this member is present within the 
set, and is capable of hybridizing to DNA even in the presence of the other 
members of the set, it is possible to employ the unfractionated set of 
oligonucleotides in the same manner in which one would employ a single 
oligonucleotide to clone the gene that encodes the peptide. 

Using the genetic code, one or more different oligonucleotides can be 
identified from the amino acid sequence, each of which would be capable of 
encoding the desired protein. The probability that a particular oligonucleotide 
will, in fact, constitute the actual protein encoding sequence can be estimated 
by considering abnormal base pairing relationships and the frequency with 
which a particular codon is actually used (to encode a particular amino acid) 
in eukaryotic cells. Using "codon usage rules," a single oligonucleotide 
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sequence, or a set of oligonucleotide sequences, that contain a theoretical 
"most probable" nucleotide sequence capable of encoding the protein 
sequences is identified. 

The suitable oligonucleotide, or set of oligonucleotides, which is 
capable of encoding a fragment of a certain gene (or which is complementary 
to such an oligonucleotide, or set of oligonucleotides) may be synthesized by 
means well known in the art (see, for example, Oligonucleotides and 
Analogues, A Practical Approach, F. Eckstein, ed., 1992, IRL Press, New 
York) and employed as a probe to identify and isolate a clone to such gene 
by techniques known in the art. Techniques of nucleic acid hybridization and 
clone identification are disclosed by Maniatis, T. , et al., in: Molecular 
Cloning, A Laboratory Manual , Cold Spring Harbor Laboratories, Cold Spring 
Harbor, NY (1982)), and by Hames, B.D., et al,, in: Nucleic Acid 
Hybridization, A Practical Approach, IRL- Press, Washington, DC (1985)). 
Those members of the above-described gene bank which are found to be 
capable of such hybridization are then analyzed to determine the extent and 
nature of coding sequences which they contain. 

To facilitate the detection of a desired DNA coding sequence, the 
above-described DNA probe is labeled with a detectable group. Such 
detectable group can be any material having a detectable physical or chemical 
property. Such materials have been well-developed in the field of nucleic acid 
hybridization and in general most any label useful in such methods can be 
applied to the present invention. Particularly useful are radioactive labels, 
such as 32 P, 3 H, M C, 3S S, 125 I, or the like. Any radioactive label may be 
employed which provides for an adequate signal and has a sufficient half-life. 
If single stranded, the oligonucleotide may be radioactively labelled using 
kinase reactions. Alternatively, polynucleotides are also useful as nucleic acid 
hybridization probes when labeled with a non-radioactive marker such as 
biotin, an enzyme or a fluorescent group. 

Thus, in summary, the elucidation of a partial protein sequence, 
permits the identification of a theoretical "most probable" DNA sequence, or 
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a set of such sequences, capable of encoding such a peptide. By constructing 
an oligonucleotide complementary to this theoretical sequence (or by 
constructing a set of oligonucleotides complementary to the set of "most 
probable" oligonucleotides), one obtains a DNA molecule (or set of DNA 
molecules), capable of functioning as a probe(s) for the identification and 
isolation of clones containing a gene. 

In an alternative way of cloning a gene, a bank is prepared using an 
expression vector, by cloning DNA or, more preferably cDNA prepared from 
a cell capable of expressing the protein into an expression vector. The bank 
is then screened for members which express the desired protein, for example, 
by screening the bank with antibodies to the protein. 

The above discussed methods are, therefore, capable of identifying 
genetic sequences that are capable of encoding a protein or biologically active 
or antigenic fragments of this protein. The desired coding sequence may be 
further characterized by demonstrating its ability to encode a protein having 
the ability to bind antibody in a specific manner, the ability to elicit the 
production of antibody which are capable of binding to the native, non- 
recombinant protein, the ability to provide a enzymatic activity to a cell that 
is a property of the protein, and the ability to provide a non-enzymatic (but 
specific) function to a recipient cell, among others. 

In order to produce the recombinant protein in the vectors of the 
invention, it is desirable to operably link such coding sequences to the glucose 
regulatable promoters of the invention. When the coding sequence and the 
operably linked promoter of the invention are introduced into a recipient 
eukaryotic cell (preferably a fungal host cell) as a non-replicating DNA (or 
RNA), non-integrating molecule, the expression of the encoded protein may 
occur through the transient (nonstable) expression of the introduced sequence. 

Preferably the coding sequence is introduced on a DNA molecule, such 
as a closed circular or linear molecule that is incapable of autonomous replica- 
tion, Preferably, a linear molecule that integrates into the host chromosome. 
Genetically stable transformants may be constructed with vector systems, or 
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transformation systems, whereby a desired DNA is integrated into the host 
chromosome. Such integration may occur de novo within the cell or, be 
assisted by transformation with a vector which functionally inserts itself into 
the host chromosome. 

The gene encoding the desired protein operably linked to the promoter 
of the invention may be placed with a transformation marker gene in one 
plasmid construction and introduced into the host celts by transformation, or, 
the marker gene may be on a separate construct for co-transformation with the 
coding sequence construct into the host cell. The nature of the vector will 
depend on the host organism. In the practical realization of the invention the 
filamentous fungus Trichoderma has been employed as a model. Thus, for 
Trichoderma and especially for T. reesei, vectors incorporating DNA that 
provides for integration of the expression cassette (the coding sequence 
operably linked to its transcriptional and translational regulatory elements) into 
the host's chromosome are preferred. It is not necessary to target the 
chromosomal insertion to a specific site. However, targeting the integration 
to a specific locus may be achieved by providing specific coding or flanking 
sequences on the recombinant construct, in an amount sufficient to direct 
integration to this locus at a relevant frequency. 

Cells that have stably integrated the introduced DNA into their 
chromosomes are selected by also introducing one or more markers which 
allow for selection of host cells which contain the expression vector in the 
chromosome, for example the marker may provide biocide resistance, e.g., 
resistance to antibiotics, or heavy metals, such as copper, or the like. The 
selectable marker gene can either be directly linked to the DNA gene 
sequences to be expressed, or introduced into the same cell by co- 
transformation. A genetic marker especially for the transformation of the 
hosts of the invention is amdS, encoding acetamidase and thus enabling 
Trichoderma to grow on acetamide as the only nitrogen source. Selectable 
markers for use in transforming filamentous fungi include, for example, 
acetamidase (the amdS gene), benomyl resistance, oligomycin resistance, 
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hygromycin resistance, aminoglycoside resistance, bleomycin resistance; and, 
with auxotrophic mutants, ornithine carbamoyltransferase (OCTase or the argB 
gene). The use of such markers is also reviewed in Finkelstein, D.B. in: 
Biotechnology of Filamentous Fungi: Technology and Products, Chapter 6, 
Finkelstein, D.B. etal., eds., Butterworth-Heinemann, publishers, Stoneham, 
MA, (1992), pp. 113-156). 

To express a desired protein and/or its active derivatives, 
transcriptional and translational signals recognizable by an appropriate host are 
necessary. The cloned coding sequences, obtained through the methods 
described above, and preferably in a double-stranded form, may be operably 
linked to sequences controlling transcriptional expression in an expression 
vector, and introduced into a host cell, either prokaryote or eukaryote, to 
produce recombinant protein or a functional derivative thereof. Depending 
upon which strand of the coding sequence is operably linked to the sequences 
controlling transcriptional expression, it is also possible to express antisense 
RNA or a functional derivative thereof. 

Expression of the protein in different hosts may result in different post- 
translational modifications which may alter the properties of the protein. 
Preferably, the present invention encompasses the expression of the protein or 
a functional derivative thereof, in eukaryotic cells, and especially in fungus. 

A nucleic acid molecule, such as DNA, is said to be "capable of 
expressing" a polypeptide if it contains expression control sequences which 
contain transcriptional regulatory information and such sequences are 
"operably linked* 1 to the nucleotide sequence which encodes the polypeptide. 

An operable linkage is a linkage in which a sequence is connected to 
a regulatory sequence (or sequences) in such a way as to place expression of 
the sequence under the influence or control of the regulatory sequence. Two 
DNA sequences (such as a coding sequence and a promoter region sequence 
linked to the 5' end of the coding sequence) are said to be operably linked if 
induction of promoter function results in the transcription of mRNA encoding 
the desired protein and if the nature of the linkage between the two DNA 
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sequences does not (1) result in the introduction of a frame-shift mutation, 
(2) interfere with the ability of the expression regulatory sequences to direct 
the expression of the protein, antisense RNA, or (3) interfere with the ability 
of the DNA template to be transcribed. Thus, a promoter region would be 
operably linked to a DNA sequence if the promoter was capable of effecting 
transcription of that DNA sequence. 

The precise nature of the regulatory regions needed for gene expression 
may vary between species or cell types, but shall in general include, as 
necessary, 5' non-transcribing and 5' non-translating (non-coding) sequences 
involved with initiation of transcription and translation respectively, such as 
the TATA box, capping sequence, CAAT sequence, and the like, with those 
elements necessary for the promoter sequence being provided by the promoters 
of the invention. Such transcriptional control sequences may also include 
enhancer sequences or upstream activator sequences, as desired. 

Expression of a protein in eukaryotic hosts such as fungus requires the 
use of regulatory regions functional in such hosts, and preferably fungal 
regulatory systems. A wide variety of transcriptional and translational regu- 
latory sequences can be employed, depending upon the nature of the host. 
Preferably, these regulatory signals are associated in their native state with a 
particular gene which is capable of a high level of expression in the host cell. 

In eukaryotes, where transcription is not linked to translation, such 
control regions may or may not provide an initiator methionine (AUG) codon, 
depending on whether the cloned sequence contains such a methionine. Such 
regions will, in general, include a promoter region sufficient to direct the 
initiation of RNA synthesis in the host cell. Promoters from filamentous 
fungal genes which encode a mRNA product capable of translation are 
preferred, and especially, strong promoters can be employed provided they 
also function as promoters in the host cell. 

As is widely known, translation of eukaryotic mRNA is initiated at the 
codon which encodes the first methionine. For this reason, it is preferable to 
ensure that the linkage between a eukaryotic promoter and a DNA sequence 
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which encodes the desired protein, or a functional derivative thereof, does not 
contain any intervening codons which are capable of encoding a methionine. 
The presence of such codons results either in a formation of a fusion protein 
(if the AUG codon is in the same reading frame as the protein-coding DNA 
sequence) or a frame-shift mutation (if the AUG codon is not in the same 
reading frame as the protein-coding sequence). 

It may be desired to construct a fusion product that contains a partial 
coding sequence (usually at the amino terminal end) of a protein and a second 
coding sequence (partial or complete) of a second protein. The first coding 
sequence may or may not function as a signal sequence for secretion of the 
protein from the host cell. For example, the sequence coding for desired 
protein may be linked to a signal sequence which will allow secretion of the 
protein from, or the compartmentalization of the protein in, a particular host. 
Such fusion protein sequences may be designed with or without specific 
protease sites such that a desired peptide sequence is amenable to subsequent 
removal. In a preferred embodiment, the native signal sequence of a fungal 
protein is used, or a functional derivative of that sequence that retains the 
ability to direct the secretion of the peptide that is operably linked to it. 
Aspergillus leader/secretion signal elements also function in THchoderma. 

If desired, the non-transcribed and/or non-translated regions 3' to the 
sequence coding for a desired protein can be obtained by the above-described 
cloning methods. The 3 '-non-transcribed region may be retained for its 
transcriptional termination regulatory sequence elements, or for those elements 
which direct polyadenylation in eukaryotic cells. Where the native expression 
control sequences signals do not function satisfactorily in a host cell, then 
sequences functional in the host cell may be substituted. 

The vectors of the invention may further comprise other operably 
linked regulatory elements such as DNA elements which confer antibiotic 
resistance, or origins of replication for maintenance of the vector in one or 
more host cells. 
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Factors of importance in selecting a particular plasmid or viral vector 
include: the ease with which recipient cells that contain the vector may be 
recognized and selected from those recipient cells which do not contain the 
vector; the number of copies of the vector which are desired in a particular 
5 host; and whether it is desirable to be able to "shuttle" the vector between host 
cells of different species. 

Once the vector or DNA sequence containing the construct(s) is 
prepared for expression, the DNA construct(s) is introduced into an 
appropriate host cell by any of a variety of suitable means, including 

10 transformation. After the introduction of the vector, recipient cells are grown 
in a selective medium, which selects for the growth of vector-containing cells. 
If this medium includes glucose, expression of the cloned gene sequence(s) 
results in the production of the desired protein, or in the production of a 
fragment of this protein as desired. This expression can take place in a 

15 continuous manner in the transformed cells, or in a controlled manner, for 
example, by induction of expression. 

Fungal transformation is carried out also accordingly to techniques 
known in the art, for example, using, for example, homologous recombination 
to stably insert a gene into the fungal host and/or to destroy the ability of the 

20 host cell to express a certain protein. 

Fungi- useful as recombinant hosts for the purpose of the invention 
include, e.g., Trichoderma, Aspergillus, Claviceps purpurea, Penicillium 
chrysogenum, Magnaporthe grisea, Neurospora, Mycosphaerella spp., 
Collectotrichum trifolii, the dimorphic fungus Histoplasmia capsulation, Nectia 

25 haematococca (anamorphiFisarium solani f. sp. phaseoli and f. sp. pisi), 
Vstilago violacea, Ustilago maydis, Cephalosporium acremonium, 
Schizophyllum commune, Podospora anserina, Sordaria macrospora, Mucor 
circinelloides, and Collectotrichum capsici. Transformation and selection 
techniques for each of these fungi have been described (reviewed in 

30 Finkelstein, D.B. in: Biotechnology of Filamentous Fungi: Technology and 
Products, Chapter 6, Finkelstein, D.B. etaL, eds., Butterworth-Heinemann, 
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publishers, Stoneham, MA, (1992), pp. 113-156). Especially preferred are 
Trichoderma reesei, T. harzianum, T. longibrachiatum, TV viride, T. koningii, 
Aspergillus mdulans, A. niger, A. terreus, A.ficwn, A. oryzae, A. awamori 
and Neurospora crassa. 

The hosts of the invention are meant to include all Trichoderma. 
Trichoderma are classified on the basis of morphological evidence of 
similarity. T. reesei was formerly known as T. viride Pers. or T. koningii 
Oudem; sometimes it was classified as a distinct species of the 
T. longibrachiatum group. The entire genus Trichoderma, in general, is 
characterized by rapidly growing colonies bearing tutted or pustulate, 
repeatedly branched conidiophores with lageniform phialides and hyaline or 
green conidia borne in slimy heads (Bissett, J., Can. J. Bot. 62:924-931 
(1984)). 

The fungus called T. reesei is clearly defined as a genetic family 
originating from the strain QM6a, that is, a family of strains possessing a 
common genetic background originating from a single nucleus of the particular 
isolate QM6a. Only those strains are called T. reesei. 

Classification by morphological means is problematic and the first 
recently published molecular data from DNA-fingerprint analysis and the 
hybridization pattern of the cellobiohydrolase 2 (cbh2) gene in T. reesei and 
T. longibrachiatum clearly indicates a differentiation of these strains (Meyer, 
W. et al t Curr. Genet. 21:27-30 (1992); Morawetz, R. etal, y Curr. Genet. 
27:31-36 (1992). 

However, there is evidence of similarity between different Trichoderma 
species at the molecular level that is found in the conservation of nucleic acid 
and amino acid sequences of macromolecular entities shared by the various 
Trichoderma species. For example, Cheng, C, et al, Nucl. Acids. Res. 
78:5559 (1 990), discloses the nucleotide sequence of T. viride cbhl . The gene 
was isolated using a probe based on the T. reesei sequence. The authors note 
that there is a 95% homology between the amino acid sequences of the 
T.viride and T. reesei gene. Goldman, G.H. et al. t Nucl. Acids Res. 18:6117 
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(1990), discloses the nucleotide sequence of phosphoglycerate kinases from 
T. viride and notes that the deduced amino acid sequence is 81 % homologous 
with the phosphoglycerate kinase gene from T. reesei. Thus, the species 
classified to T. viride and T. reesei must genetically be very close to each 
other. 

In addition, there is a high similarity of transformation conditions 
among the Trichoderma. Although practically all the industrially important 
species of Trichoderma can be found in the formerly discussed Trichoderma 
section Longbrachiatum, there are some other species of Trichoderma that are 
not assigned to this section. Such a species is, for example, Trichoderma 
harzianum, which acts as a biocontrol agent against plant pathogens. A 
transformation system has also been developed for this Trichoderma species 
(Herrera-Estrella, A. et al., Molec. Microbiol. 4:839-843 (1990) that is 
essentially the same as that taught in the application. Thus, even though 
Trichoderma haruanum is not assigned to the section Longibrachiatum, the 
method used by Herrera-Estrella in the preparation of spheroplasts before 
transformation is the same. The teachings of Herrera-Estrella show that there 
is not a significant diversity of Trichoderma spp. such that the transformation 
system of the invention would not be expected to function in all Trichoderma. 

Further, there is a common functionality of fungal transcriptional 
control signals among fungal species. At least three A. nidulans promoter 
sequences, arndS, argB, and gpd, have been shown to give rise to gene 
expression in T. reesei. For amdS and argB, only one or two copies of the 
gene are sufficient to being about a selectable phenotypes (Penttila etal., Gene 
67:155-164 (1987). Gruber, F. et al., Curr. Genetic 25:71-76 (1990) also 
notes that fungal genes can often be successfully expressed across different 
species. Therefore, it is to be expected that the glucose regulated promoters 
identified herein would be also regulatable by glucose in other fungi. Except 
for cbhl, it is understood that the glucose regulated promoters of the invention 
may not be directly regulated by glucose, but rather that they function 
regardless of its presence. 
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Many species of fungi, and especially Trichoderma y are available from 
a wide variety of resource centers that contain fungal culture collections. In 
addition, Trichoderma species are catalogued in various databases. These 
resources and databases are summarized by O'Donnell, K. et al., in 
Biochemistry of Filamentous Fungi: Technology and Products, D.B. 
Fingelstein etal., eds., Butterworth-Heinemann, Stoneham, MA, USA, 1992, 
pp. 3-39. 

After the introduction of the vector and selection of the transformant, 
recipient cells are grown in a selective medium, which selects for the growth 
of vector-containing cells. Expression of the cloned gene sequence(s) results 
in the synthesis and secretion of the desired heterologous or homologous 
protein, or in the production of a fragment of this protein, into the medium of 
the host cell. 

In a preferred embodiment, the coding sequence is the sequence of an 
enzyme that is capable of hydrolysing lignocellulose. Examples of such 
sequences include a DNA sequence encoding cellobiohydrolase I (CBHI), 
cellobiohydrolase II (CBHII), endoglucanase I (EGI), endoglucanase II (EGII), 
endoglucanase III (EGI II), jS-glucosidases, xylanases (including endoxylanases 
and /3-xylosidase), side-group cleaving activities, (for example, a- 
arabinosidase, a-D-glucuronidase, and acetyl esterase), mannanases, pectinases 
(for example, endo-polygalacturonase, exo-polygalacturonase, pectinesterase, 
or, pectin and pectin acid lyase), and enzymes of tignin polymer degradation, 
(for example, lignin peroxidase LIII from Phlebia radiata (Saloheimo et al., 
Gene 85:343-351 (1989)), or the gene for another ligninase, laccase or Mn 
peroxidase (Kirk, In: Biochemistry and Genetics of Cellulose Degradation, 
Aubert et al. (eds.), FEMS Symposium No. 43, Academic Press, Harcourt, 
Brace Jovanovitch Publishers, London, pp. 315-332 (1988))). The cloning of 
the cellulolytic enzyme genes has been described and recently reviewed C^eeri, 
T.T. in: Biotechnology of Filamentous Fungi: Technology and Products, 
Chapter 14, Finkelstein, D.B. etaL, eds., Butterworth-Heinemann, publishers, 
Stoneham, MA, (1992), pp. 417-445). The gene for the native 
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cellobiohydrolase CBHI sequence has been cloned by Shoemaker et al. 
(Shoemaker, S., et al., Bio/Technology 7:691-696 (1983)) and Teeri et al. 
(Teeri, T., et al., Bio/Technology 7:696-699 (1983)) and the entire nucleotide 
sequence of the gene is known (Shoemaker, S. , et al. , Bio/T echnology 7:691- 
5 696 (1983)). From T. reesei, the gene for the major endoglucanase (EGI) has 
also been cloned and characterized (Penttila, M., et al.. Gene 45:253-263 
(1986); Patent Application EP 137,280; Van Arstel, J.N.V., et al., 
Bio/Technology 5:60-64). Other isolated cellulase genes include cbh2 (Patent 
Application WO 85/04672; Chen, CM., et al., Bio/Technology 5:274-278 

10 (1987)) and egl3 (Saloheimo, M., et al. t Gene 65:11-21 (1988)). The genes 
for the two endo-/3-xylanases of T. reesei ixlnJ and xln2 have been cloned and 
described in applicants' copending application, U.S. 07/889,893, filed May 
29, 1992. The xylanase proteins have been purified and characterized 
(Tenkanen, M. et al., Proceeding of the Xylans and Xylanases Symposium, 

15 Wageningen, Holland (1991)). 

The expressed protein may be isolated and purified from the medium 
of the host in accordance with conventional conditions, such as extraction, 
precipitation, chromatography, affinity chromatography, electrophoresis, orthe 
like. For example, the cells may be collected by cemrifugation, or with 

20 suitable buffers, lysed, and the protein isolated by column chromatography, 
for example, on DEAE-cellulose, phosphocellulose, polyribocytidylic acid- 
agarose, hydroxyapatite or by electrophoresis or immunoprecipitation. 

The manner and method of carrying out the present invention may be 
more fully understood by those of skill by reference to the following 

25 examples, which examples are not intended in any manner to limit the scope 
of the present invention or of the claims directed thereto. 
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Example 1 

Isolation of Trichoderma reesei Genes Strongly Expressed on Glucose 

For the isolation of glucose induced mRNA Trichoderma reesei strain 
QM9414 (Mandels, M. et al.,Appl. Microbiol. 21: 152-154 (1971)) was grown 
5 in a 10 liter fermenter in glucose medium (glucose 60 g/I, Bacto-Peptone 5 
g/1, Yeast extract 1 g/1, KH 2 P0 4 4 g/1, (NH 4 ) 2 S0 4 4 g/1, MgS0 4 0.5 g/1, CaCl 2 
0.5 g/1 and trace elements FeS0 4 »7H 2 0 5 mg/1, MnS0 4 «H 2 0 1.6 mg/1, 
ZnS0 4 »7H 2 0 1.4 mg/1, and CoCl 2 »6H 2 0 3.7 mg/1, pH 5.0-4.0). Glucose 
feeding (465g/20h) was started after 30 hours of growth. Mycelium was 

10 harvested at 45 hours of growth and RNA was isolated according to Chirgwin, 
J.M. et aL, Biochem. J. 75:5294-5299 (1979)). Poly A+ RNA was isolated 
from the total RNA by oligo(dT)-cellulose chromatography (Maniatis, T. 
etal. y Molecular Cloning: A Laboratory Manual, Cold Spring Harbor 
Laboratory, Cold Spring Harbor, NY (1982)) and cDNA synthesis and cloning 

15 of the cDNAs was carried out according to manufacturer's instructions into 
lambda-ZAP vector (ZAP-cDNA synthesis kit, Stratagene). The cDNA bank 
was transferred onto nitrocellulose filters and screened with "P-labelled single- 
stranded cDNA synthesized (Teeri, T.T. etal., Anal. Biochem. 764:60-67 
(1987)) from the same poly A+ RNA from which the bank was constructed. 

20 The labelled cDNA was relabelled with 32 P-dCTP (Random Primed DNA 
Labeling kit, Boehringer-Mannheim). The hybridization conditions were as 
described in Maniatis, T. et al. , Molecular Cloning: A Laboratory Manual, 
Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1982). Fifty 
clones giving the strongest positive reaction were isolated and the cDNAs were 

25 subcloned in vivo into Bluescript SK(-) plasmid according to manufacturer's 
instructions (ZAP-cDNA synthesis kit, Stratagene). 

To identify the clones and exclude the same ones they were all 
sequenced from the 3' end by using standard methods. The frequency of each 
specific clone in the cDNA lambda-bank was determined by hybridizing the 

30 bank with a clone specific PCR probe. The clones cDNA33, cDNAl, 
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cDNAlO, cDNA12, cDNA15, showing the five highest frequencies 
corresponded to 1-3% of the total mRNA pool. 

Example 2 

Characterization of Isolated Glucose Expressed Trichoderma Genes and 
5 Their Promoters 

The cDNAs of the clones cDNA33, cDNAl, cDNAlO, cDNA12, and 
cDNA15 were used as probes to isolate the corresponding genes and 
promoters from a Trichoderma chromosomal lambda-bank prepared earlier 
(Vanhanen, S. et <z/., Curr. Genet. 15: 181-186 (1989)). On the basis of 

10 Southern analysis of restriction enzyme digestions carried out for the 
chromosomal lambda clones, the promoters and either the 5' parts of the 
chromosomal genes or the whole genes were subcloned into pSP73 vector 
(Promega, Madison, USA) using appropriate restriction enzymes yielding the 
plasmids pTHNl (Figure 1), pEA33 (Figure 2), pTHN3 (Figure 3), pEAlO 

15 (Figure 4), pEA12 (Figure 5) and pEA155 (Figure 6), corresponding to the 
clones CDNA33, cDNAl, cDNAlO, cDNA12 and cDNA15, respectively. 
Sequences were obtained from the 5' ends of the genes and from the 
promoters using primers designed from previously obtained sequences. The 
sequences of the isolated promoters and genes or parts of them (either obtained 

20 from cDNA or chromosomal DNA) are shown in SEQ ID1 for cDNA33, SEQ 
ID2 for cDNAl, SEQ ID3 for cDNAlO, SEQ ID4 for cDNA12, and SEQ 1D5 
for cDNA15, Based on sequence similarity to known sequences in a protein 
data bank the clone cDNA33 could be identified as a translation elongation 
factor, TEFla. 
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Example 3 

Construction of Vectors for Expression ofEGI-core under the tefl -Promoter 
in Trichoderma 

A Xhol + Dralll fragment that is internal to the egll cDNA [SEQ ID 
16 and Figure 7 A] sequence of plasmid pPLE3 (Figure 7) carrying theEcoRl- 
BamHl fragment of egll cDNA from pTTcll (Penttila et aL, Gene 45:253- 
263 (1986); Penttila et aL, Yeast 3:175-185 (1987) inbetween the cbhl 
promoter and c. 700 nt long Avail terminator fragment was replaced by a 
Xhol-Dralll fragment of cDNA from plasmid pEG131 (Nitisinprasert, S., 
Reports from Department of Microbiology, University of Helsinki (1990)). 
The pPEG131 insert sequence is egll cDNA in which a STOP codon is 
constructed just before the hinge region of the egll gene. The cbhl terminator 
sequence is Figure 7B [SEQ ID 23). SEQ ID 23 is a shortened cbhl 
terminator sequence, similar to SEQ ID 24 (the "long" cbhl terminator but 
lacking 30 nucleotides at the 5 ' end). 

pPLE3 contains a pUC18 backbone, and carries the cbhl promoter 
inserted at the £o>RI site. The cbhl promoter is operably linked to the full 
length egll cDNA coding sequence and to the cbhl transcriptional terminator. 
The ori and amp genes are from the bacterial plasmid. 

The resulting plasmid pEM-3 (Figure 8) now carries a copy of egll 
cDNA with a translational stop codon after the egll core region (EGI amino 
acids 1-22 are the EGI signal sequence; EGI amino acids 23-393, terminating 
at a Thr, are considered the 'core* sequence). pEM-3 was then digested with 
EcoVlA and Sphl and the released Bluescribe M13+ moiety (Vector Cloning 
Systems, San Diego, USA) of the plasmid was replaced by £coRI and Sphl 
digested pAMD (Figure 8) containing a 3.4 kb amdS fragment from plasmid 
p3SR2 (Hynes, M.J. et at., Mol. Cell. Biol. 3:1430-1439 (1983); Tilburn, J. 
et al. r Gene 26:205-221 (1983). This resulting plasmid pEM-3A (Figure 8) 
was digested with EcoRl and partially with Kspl to release the 2.3 kb fragment 
carrying the cM7-promotor and the 8.6 kb fragment carrying the rest of the 
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plasmid was purified from agarose gel. Based on the sequence data of the tefl 
promoter (SEQ ID1 bases 1-1234), two primers were designed (SEQ ID6 and 
SEQ ID7) and used in a PCR reaction to isolate a 1.2 kb promoter fragment 
adjacent to the translational start site of the tefl gene. The 5' primer was 

ACCG GAATTCa T ATCTAGAG GftGCCCGCGAjSTTTGGATACGCC (SEQ ID6) 

and the 3' primer was 

ACCGCCGC^CTTTGACGGTTTGTGTGATGTAGCG (SEQ ID7). 

The bold and underlined GAATTC in the 5' primer is an EcoRl site. The 
bold and underlined TCTAGA in the 5' primer is an Xba\ site. The bold and 
underlined CCGCGG in the 3' primer is z Sacll site. This fragment was 
digested with EcoR\ and partially with Kspl and purified from agarose gel and 
ligated to the 8.6 kb pEM-3A fragment resulting in plasmid pTHNlOOB 
(Figure 9). This expression vector carries DNA encoding the EGI-core 
construction operably linked to the tefl promoter; this plasmid also carries an 
amdS marker gene for selection of Trichbderma transformants. 

Example 4 

Transformation of Trichoderma, Purification of the EGI-Core Producing 
Clones and Their Analysis 

Trichoderma reesei strain QM9414 was transformed essentially as 
described (Penttila, M. et al., Gene 61: 155-164 (1987) using 6-10 fig of the 
plasmid pTHNlOOB. The Amd + transformants obtained were streaked twice 
onto slants containing acetamide (Penttila, M. et al. Gene 67:155-164 (1987)). 
Thereafter spore suspensions were made from transformants grown on Potato 
Dextrose agar (Difco). EGI-core production was tested by slot blotting with 
EGI specific antibody from 50 ml shake flask cultures carried out in minimal 
medium (Penttila, M. et al. Gene 67:155-164 (1987)) supplemented with 5% 
glucose and using additional glucose feeding (total amount of fed glucose was 
6 ml of 20% glucose). The spore suspensions of the EGI-core producing 
clones were purified to single spore cultures on Potato Dextrose agar plates. 
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EGI-core production was analyzed again from these purified clones as 
described above (Figure 10). 

Example 5 

Characterization of EGI-core produced by Trichoderma Grown on Glucose 

5 EGI-core producing strain pTHN100B-l 6c was grown in a 10 liter 

fermenter in glucose medium as described earlier in Example 1 except that 
yeast extract was left out and glucose feeding was 555g/22h. The culture 
supernatant was separated from the mycelium by centrifugation. The secretion 
of EGI-core by Trichoderma was verified by Western blotting by conventional 

10 methods running concentrated culture supernatants on SDS-PAGE and treating 
the blotted filter with monoclonal EGI-core specific antibodies (Figure 1 1 and 
Figure 12). The enzyme activity was shown semiquantitatively in a microliter 
plate assay by using the concentrated culture supernatants and 3 mM 
chloronitrophenyl lactocide as a substrate and measuring the absorbance at 405 

15 nm (Clayessens, M. et aL, Biochem. J. 26i:819-825 (1989). 

Example 6 

Construction of p-Galactosidase Expression Vectors with Truncated 
Fragments of the cbhl-Promoter 

The vector pMLO!6 (Figure 13) contains a 2.3 kb cbhl promoter 
20 fragment ([SEQ ID18, Figure 13A) starting at 5' end from the EcoKi site, 
isolated from chromosomal gene bank of Trichoderma reesei (Teeri, T. et aL, 
/. Bio/Technology 1:696-699 (1983)), a 3.1 kb BamHl fragment of the lacZ 
gene from plasmid pAN924-21 (van Gorcom et al., Gene 40:99-106 (1985)) 
and a 1.6 kb cbhl terminator (Figure 13B, [SEQ ID 24]) starting from 84 bp 
25 upstream from the translation stop codon and extending to a BamHl site at the 
3' end (Shoemaker, S. et aL, Bio/Technology 7:691-696 (1983); Teeri, T. 
et aL, Bio/Technology i:696-699 (1983)). These pieces were linked to a 2.3 
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kb long EcoRl-Pvull region of pBR322 (Sutcliffe, J.G., Cold Spring Harbor 
Symp. Quant. Biol. 43:17-90 (1979)) generating junctions as shown in Figure 
13. The exact in frame joint between the 2.3 kb cbhl promoter and the 3. 1 
kb lacZ gene was constructed by using an oligo depicted in Figure 13. A 
polylinker shown in Figure 13 was cloned into the single internal Xbal site in 
the cbhl promoter for the purpose of promoter deletions. A short Sail linker 
shown in Figure 13 was cloned into the joint between the pBR322 and cbhl 
promoter fragments so that the expression cassette can be released from the 
vector by restriction digestion with Sail and Sphl. Progressive unidirectional 
deletions were introduced to the cbhl promoter by cutting the vector with 
Kpril and Xho\ and using the Erase-A-Base System (Promega, Madison, USA) 
according to manufacturer's instructions. Plasmids obtained from different 
deletion time points were transformed into the E. coli strain DH5ot (BRL) by 
the method described in (Hanahan D., J. Mol. Biol. 766:557-580 (1983)) and 
the deletion end points were sequenced by using standard methods. 

Example 7 

Transformation of Trichoderma, Isolation of the fi-Galactosidase Producing 
Clones and Their Analysis 

Trichoderma reesei strain QM9414 was transformed with expression 
vectors for /3-galactosidase containing either the intact 2.3 kb cbhl promoter 
or truncated versions of it, generated as explained in Example 6. Twenty pig 
of the plasmids were digested with Sail and Sphl to release the expression 
cassettes from the vectors and these mixtures were cotransformed to 
Trichoderma together with 3 fig of plasmid p3SR2 (Hynes, M.J. et al., Mol. 
Cell. Biol. 3: 1430-1439 (1983)) containing the acetamidase gene. The 
transformation method was that described in (Penttila, M. et al. Gene 61: 155- 
164 (1987)) and the Amd + transformants were screened as described earlier 
in Example 4. The /J-galactosidase production of the Amd + transformants was 
tested by inoculating spore suspensions on microtiter plate wells containing 
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solid minimal medium (Penttila, M. etal. Gene 67:155-164 (1987)) 
supplemented with 2% glucose; 2% fructose and 0.2% peptone and pH 
adjusted to 7. After 24 h incubation in 28°C, 10 y\ of the chromogenic 
substrate X-gal (20 mg/ml) was added to each well and the formation of blue 
color was followed as an indication of /3-galactosidase activity. An intense 
blue color could be detected in transformants transformed with a plasmid 
pML016de!5(ll) (Figure 14) containing a 1110 bp deletion in the cbhl 
promoter beginning from the promoter internal polylinker and ending 385 bp 
before the translation initiation site (Figure 15). The sequence of this 
truncated promoter is provided as SEQ ID19 (Figure 15A). 

Example 8 

Production of CBHI on Glucose with the Glucose-Derepressed cbhl- 
Promoter 

For the production of CBHI on glucose an expression plasmid 
pMLO 17 (Figure 16) was constructed. The plasmid pML016del5(ll) was 
digested with the enzymes Ksp\ (the first nucleotide of the recognition 
sequence is at the position -16 from the ATG) and Xmal (the first nucleotide 
of the recognition sequence is 76 nucleotides downstream from the translation 
stop codon of the cbhl gene). The vector part containing the shortened cbhl 
promoter, the cbhl terminator and the pBR322 sequence was ligated to the 
chromosomal cbhl gene isolated as a J&pI-A>wtfI-fragment from the 
chromosomal gene bank of Trichoderma reesei (Teeri, T. etal., 
Bio/Technology 7:696-699 (1983)). The sequence of this fragment is provided 
as the underlined portion of Figure 16A ([SEQ ID17]). The plasmid pML017 
was transformed to the Trichoderma reesei strain QM 9414 and the Amd + 
transformants were screened as described earlier in example 7. CBHI 
production was tested from 40 transformants in microtiter plate cultures (200 
/d; 3 days) carried out in minimal medium (Penttila, M. et al. Gene 67:155- 
164 (1987) supplemented with 3% glucose and using additional glucose 
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feeding (total amount of fed glucose was 6 mg/200 ui culture). The culture 
supematants were slot blotted on nitrocellulose filters and CBHI was detected 
with specific antibody. The spore suspensions of the 10 best CBHI producing 
transformants were purified to single spore cultures on plates containing 
5 acetamide and Triton X-100 (PenttilS, M. et al. t Gene 61: 155-164 (1987)). 
Thirty single spore cultures were tested for CBHI production in shake flask 
cultivations (50 ml; 6 days) carried out in the same medium as described 
above. The total amount of fed glucose was 1.8g/50ml culture. Dilutions of 
the culture supematants were slot blotted and CBHI was detected with specific 
10 antibody {Figure 17). 



Example 9 

$-Galactosidase Expression Vectors with Specific Mutations in cbhl 
Promoter to Release Glucose Repression 



Three 6 bp sequences found in cbhl promoter similar to binding sites 
15 of Saccnaromyces cerevisiae glucose repressor protein MIG1 (Nehlin & 
Ronne, EmOJ. 9:2891-2899 (1990); Netting a/., EMBOJ. 70:3373-3377 
(1991)) were changed into other nucleotides to study the functionality of these 
mig-like sequences in mediating the glucose repression of the native cbhl 
promoter of Trichoderma reesei. To construct 0-galactosidase expression 
20 vectors with cbhl promoters carrying specific mutations, sequence alterations 
were made into primers (specifically: tct tca aga att gct cga cca att 

CTC ACG GTG AM* GTA GG (SEQ ID 8); ACA CAT CTA GAG GTG ACC TAG GCA 
TTC TGG CCA CTA GAT ATA TAT TTA GAA GGT TCT TGT AGC TCA AAA GAG 

c (SEQ ID 9); ggg aat tct cta gaa acg cgt tgg caa att acg gta cg 
25 (SEQ ID 10); ggg aat tcg gtc acc tct aaa tgt gta att tgc ctg ctt 
gac c (SEQ ID 11); ggg aat tcg gtc acc tct aaa tgt gta att tgc ctg 

CTT GAC CGA TCT AAA CTG TTC GAA GCC CGA ATG TAG G (SEQ ID 12); GGG 
AAT TCT TCT AGA TTG CAG AAG CAC GGC AAA GCC CAC TTA CCC (SEQ ID 
13); TAG CGA ATT CTA GGT CAC CTC TAA AGG TAG CCT GCA GCT CGA GCT 

30 AG (SEQ ID 14); and ggg aat tca tga tgc gca gtc cgc gg (SEQ ID 15); 



WO 94/04673 



PCT/FI93/00330 



-39- 

these primers were specific for the cbhl promoter and the cbhl promoter 
internal polylinker and were used in PCR amplification of cbhl promoter 
sequences for cloning. 

pML016 (Figure 13) was used as a PCR template with the appropriate 
5 primers to yield a 770 bp fragment A (primers tag cga att cta ggt cac 

CTC TAA AGO TAC CCT GCA GOT CGA GCT AG (SEQ ID 14) and GGG AAT TCT 

cta gaa acg cgt tgg caa att ACG gta cg (SEQ ID 10), beginning at the 
polylinker at -1500 and ending at -720 upstream of ATG, and a 720 bp 
fragment B (primers ggg aat tct tct aga ttg cag aag cac ggc aaa gcc 

10 cac tta ccc (SEQ ID 13) and ggg aat tca tga tgc gca gtc cgc gg 
(SEQ ID 15)), beginning at -720 and ending at Kspl at -16. Fragments A and 
B were purified from agarose gel and digested with BstEU-Xbal and Xba\-Ksp\ 
respectively, ligated to the 7.8 kb fragment of pMLO!6 to produce pMI-24. 
The resulting cbhl promoter carries a sequence alteration (genomic sequence 

15 5' GTGGGG, altered sequence: 5' TCTAGA) at position -720 to -715 
upstream of the translation initiation codon of intact cbhl promoter (Figure 
18). The sequence of the altered cbhl promoter in pMI-24 is provided in 
Figure 18A and SEQ ID20. 

pMLO16del0(2) (Figure 19) containing a 460 bp deletion in the cbhl 

20 promoter beginning from the promoter internal polylinker and ending 1025 bp 
before the translation initiation site was constructed as described in Example 
6 and used as a PCR template with primers (tct tca aga att gct cga cca 

ATT CTC ACG GTG AAT GTA GG (SEQ ID 8) and ACA CAT CTA GAG GTG ACC 
TAG GCA TTC TGG CCA CTA GAT ATA TAT TTA GAA GGT TCT TGT AGC TCA 

25 aaa gag c (SEQ ID 9)) to yield a 800 bp fragment C, beginning from the 5" 
end of cbhl promoter and ending at the promoter internal polylinker. 
Fragment C was purified from agarose gel, digested with SaK-Xbal and ligated 
to the 7.6 kb Sall-Xbal fragment of pMLO16del0(2) to produce pMI-25. The 
cbhl promoter of pMI-25 has a sequence alteration (genomic sequence: 

30 5'GTGGGG, altered sequence: 5TCTAAA) at position -1505-1500 upstream 
of the translation initiation codon of intact cbhl promoter (Figure 18). 
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pMLO16deI0(2) was used as a PCR template to yield a 750 bp 
fragment D (primers ggg aat tcg gtc acc tct aaa tgt gta att tgc ctg 

CTT GAC CGA TCT AAA CTG TTC GAA GCC CGA ATG TAG G (SEQ ID 12) and 

ggg aat tca tga tgc gca gtc cgc gg (SEQ ID 15)), beginning from the 
5 promoter internal polylinker and ending at Kspl at -16. Fragment D was 
purified from agarose gel, digested with BstEll-Kspl and ligated to the 7.8 kb 
BstEU-Kspl fragment of pMI-25 to produce pMI-26. The cbhl promoter of 
pMI-26 has sequence alterations at positions -1505-1500 (genomic sequence: 
5'GTGGGG, altered sequence: 5TCTAAA) and -1001-996 (genomic 

10 sequence: 5'CTGGGG, altered sequence: 5 TCT AAA) upstream of the 
translation initiation codon of intact cbhl promoter (Figure 18). 

pMLO16del0(2) was used as a PCR template to yield a 280 bp 
fragment E (primers ggg aat tct cta gaa acg cgt tgg caa att acg gta 
cg (SEQ ID 10) and ggg aat tcg gtc acc tct aaa tgt gta att tgc ctg 

15 CTT gac c (SEQ ID 11)), beginning from the promoter internal polylinker 
and ending at -720 and a 720 bp fragment F (primers ggg aat tct tct aga 

TTG CAG AAG CAC GGC AAA GCC CAC TTA CCC (SEQ ID 13) and GGG AAT 

tca tga tgc gca gtc cgc gg (SEQ ID 15)), beginning at -720 and ending 
at Kspl at -16. Fragments D and E were purified from agarose gel, digested 

20 with BstE\\-Xba\ and Xbal-Kspl respectively and ligated to the 7. 8 kb BstEII- 
Kspl fragment of pMI-25 to produce pMI-27. The cbhl promoter of pMI-27 
has sequence alterations at positions -1505-1500 (genomic sequence: 
5'GTGGGG, altered sequence: 5TCTAAA) and -720-715 (genomic sequence: 
5'GTGGGG, altered sequence: 5TCTAGA) upstream of the translation 

25 initiation codon of intact cbhl promoter (Figure 18). Hie sequence of the 
altered cbhl promoter of pMI-27 is shown in Figure 18C and SEQ ID21. 

pMLO16deI0(2) was used as a PCR template to yield a 280 bp 
fragment G (primers ggg aat tct cta gaa acg cgt tgg caa att acg gta 
cg (SEQ ID 10) and ggg aat tcg gtc acc tct aaa tgt gta att tgc ctg 

30 CTT GAC CGA TCT AAA CTG TTC GAA GCC CGA ATG TAG G (SEQ ID 12)), 

beginning from the promoter internal polylinker and ending at -720 and a 720 
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bp fragment H (primers ggg aat tct tct aga ttg cag aag cac ggc aaa 
. gcc cac tta ccc (SEQ ID 13) and ggg aat tca tga tgc gca gtc cgc gg 
(SEQ ID 15)), beginning at -720 and ending at Kspl at -16. Fragments G and 
H were purified from agarose gel, digested with BstEII-Xbal and Xbal-Kspl 
5 respectively and ligated to the 7.8 kb BstEII-KspI fragment of pMI-25 to 
produce pMI-28. The cbhl promoter of pMI-28 has sequence alterations at 
positions -1505-1500 (genomic sequence: 5'GTGGGG, altered sequence: 
5TCTAAA), -1001-996 (genomic sequence: 5'CTGGGG, altered sequence: 
5TCTAAA), and-720-715 (genomic sequence: 5'GTGGGG, altered sequence: 

10 5TCTAGA) upstream of the translation initiation codon of intact cbhl 
promoter (Figure 18). The sequence of the altered cbhl promoter of pMI-28 
is shown in Figure 18C and SEQ ID22. 

All VCR amplified DNA fragments and ligation joints were sequenced 
using standard methods to ensure that the mutations were present and no other 

15 nucleotides were changed. Transformation of Trichoderma reesei QM9414 
with the vectors mentioned above, isolation of 0-galactosidase producing 
clones and their analysis was done as described in Example 7. After addition 
of X-gal, an intense blue color was detected on glucose grown transformant 
colonies as an indication of /3-gaiactosidase activity in transformants 

20 transformed with the plasmids pMl-24, pMI-27 and pMI-28 (Figure 20), 
indicating that altering the cbhl promoter according to any of those mutations 
was sufficient to allow for expression of proteins in Trichoderma under the 
cbhl promoter in the presence of glucose. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 
(1) APPLICANT: 



(A) 


NAME: 


ALKO Ltd 


(B) 


STREET: 


Salmisaarenranta 7 H 


(C) 


CITY: 


Helsinki 


(D) 


COUNTRY: 


Finland 


(E) 


POSTAL CODE: 


FIN-00180 



(ii) TITLE OF INVENTION: Fungal Promoters Active In The Presence Of 
Glucose 



(iU) NUMBER OF SEQUENCES: 28 



(iv) CORRESPONDENCE ADDRESS: 



(A) 


ADDRESSEE: 


ALKO Ltd 






Law Department/Patents 


(B) 


POSTAL ADDRESS: 


P.O. Box 350 


(C) 


CITY: 


Helsinki 


(D) 


COUNTRY: 


Finland 


(E) 


POSTAL CODE: 


FIN-00101 



(v) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 07/932,485 

(B) FILING DATE: 19-AUG-1992 



(vi) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 358-0-13311 
CB) TELEFAX: 358-0-1333346 



<2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3461 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO:l: 



CGCCGTGACG 


ACAGAAACGG 


AGCCCGCGAG 


TTTGGATACG 


CCGCTGAAAT 


GGGGCTTGAC 


€0 


GGTGAAGGAG 


AAGCCGAGCG 


CGGTGCCAGA 


GGACAAGATG 


GATGTAGAGC 


CAGGCGACGA 


120 


CGACCAAACG 


CAACCATCAA 


ATCAATCAGA 


TGGCAATGAC 


GCACCACCGC 


CCCAGCAGCG 


1B0 


CGAACCGCCG 


ACGAAGAAGC 


CATGGACGCG 


CTCCTCGGCA 


AGACGCCCAA 


GGAACAGAAA 


240 


AAAGTAATCT 


CCGCACCCGT 


ATCAGAAGAC 


GACGCCTACC 


GCCGCGACGT 


CGAAGCCTCC 


300 


GGCGCGGTGT 


CCACGCTCCA 


GGATTACGAA 


GACATGCCCG 


TCGAGGAGTT 


TGGCGCCGCC 


360 


CTCCTCCNNN 


GCATGGGCTG 


GAACGGGGAA 


GCCCGCGGCC 


CGCCGGTCAA 


GCAGGTCAAG 


420 
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AGGCGGUAGA 


TI PT1 ftTTtT'f 


t-t_ J. (_Gl=l_Gl_l_ 


n r ppta ppfppTi 
AA IjGAGL. 1 *-A 


TVPPTiPPTl TIPS 




Ann 


GGG 1 Gwvw V 


71PTA T\ PPPPT1 TV 
AVaAA*- « ^- Aft 






P PTP TA P P PT1 fZ 


T* Ti T Pf2rtll PPf2 




AGGAGAGCAA 


GCGCAAGGAA 




7\ f p/-"*7\ r"""?\ p 


* m ■ i>i rpTA Tw Ti PP fi 

trlAlAAAI_uA 


P » P TA P *^P Ti P P 




GCGAACGGAT 


CGCGAGAGGG 


a Tri pta pa p 


PPAPPPAPS P 


PPPP7A P7irT!PP 

luubAUluw 


A TPP PfZA TT A 


O 0 u 


TAGGGATCGG 


GATAGGGATA 


PI PS TP/2f2P.A 


^uA A A 


PAPA/2PPA PP 


PAPATPPPfIA 


/ z u 


CTCTGACCGG 


CACCATCGAC 


fZATRA A fin A (2 


P'l " fTTP PA'I" I 1 




T PA A PCA PTT 

X LAnVWl^ X X 


"7R fl 


TTGAGACTAA 


CATTAACCAT 


Cf^Pn'W'ITPT 

X X X A X 


T(2A A A A P.PTT 


flTAPTPATPA 


T fZ A TflTTTTT 
X vjrt 101 x xxx 


84 0 


AAGCAAATAG 


GCGACAGGCG 


TA CA CJA C AC C 


TTAATATCAC 

X XfM X ft X 


A TA GACfl PA C 


CJ f! CA CA PAT A 


9 00 


CGTCTTGGAG AAGACACGTA 




n itivi pta H T 


TR PPT7A PTPT 


P A PTTPTPT A 


acn 


AATTAGAATA 


TCAATGACAC 




<i TPPTA PPT 


PPPH HTPPTP 


A PAPATTPTP 

AUAUA llulL 




TGATCTGCGA 


ATTTGTATGT 


G CTGCCTCTC 


CCTCTGACCT 


TCTGGTCTGG 


TGATA ccatc 


1080 


CTCC CTCAGT 


TTGGATCATC 


GCCTTATTCT 


TCTTCCCTCT 


TCTGCATCTG 


CTTCCTGCTC 


114 0 


GTTTGAGGAA 


CATCGCCAGC 


TGACTCTGCT 


TGCCTCGCAG 


CGATCTAGTC 


AAGAACAACA 


1200 


CNAG CTCTCA 


CGCTACATCA 


CACAAACCGT 


CAAAATGGGT 


AAGGAGGACA 


AGACTCACAT 


1260 


CAACGTGGTC 


GTCATCGTAC 


GTATTTTCCG 


ATCCCTCATC 


GGCNGTCATC 


TGNCCAGTCT 


1320 


GATTCCAAGA 


ATCACCGTGC 


TAACCATATA 


CCATCTANGG 


GTGCGTATTC 


CATCAATCAT 


1380 


CTTGAGCCAG 


ATCGACCGAA 


CATACGATAC 


TGACTTTGCT 


ACGACAGCCA 


CGTCGACTC C 


1440 


GGCAAGTCTA 


CCACCGTGAG 


TAAACACCCA 


TTCCACTCCA 


CGACCGCAAG 


CTCCATCTTG 


1500 


CGCGTGGCGT 


CTCTGCGATG 


AaCATCCGAA 


ACTGACGTTC 


TGTTACAGAC 


TGGTCACTTG 


1560 


ATCTACCAGT 


GCGGTGGTAT 


p/"*ta ptv ti pppt 


PPT* Ttmpii PTv 

AULA 1 1 UAtvA 




GttlAAv3L.1 1U 


icon 


w x J, J. A Ann 


tpt p p a ra pr* 


/"*P7\ P PPP}\ "7\ *P 


CTTTGCCCAT 




TPTPPPPTi Ti P 


icon 
lb 0 U 


GAATGCTGTG 


CCGACACGAT 




JATP7A PPPPPP 


x x ILlLtlAL 


PPPTPPTTPf^ 


1 (ill 


AGCGACGCAA 


ATTTTTTTTG 


nTfi nnTTS PC* 
(- lt>^t- 1 1A>-G 


74 fTTTTlV PTP 

Aul 1 JL lAul vi 


ppp»T*pp P7V pp 


TP7A PA A PPPP 


IOUU 


ACTACTGCTC 


TCTGGCCGCT 


ppppAfSTPA p 


PPA A P/ITPA 1* 


PA A PfZPA rtPA 


It J. X X X ^M/l X 0 


lODu 


AGCGATGCTA 


ACCATATTCC 


PTPP A A P A P.P 


aapppppppa 


APTPf3PPAAfi 


PP^PTPPTTP A 


1 q 0 n 


AGTACGCGTG 


GGTTCTTGAC 








A PPA TPfS A PA 


1 fl fl 
17DU 


TTGCCCTCTG 


GAAGTTCGAG 


j. ptppptl jv pt 


A PTATPTP A P 


p/lfp p ^ TTP-f^T 
Uul tAJ XiauX 


A TP. TT P.P. P A f2 


0 n A A 


CCATCACCTC 


ACTGCGTCGT 


TfiAPAPATPA 
1 unUlUt J. LA 


A APTBB PA AT 
AAL 1AAUAA1 


ppppTPA PA f5 


A Pf2PTPPPf2f2 


A JL U U 


CCACCGTGAC 


TTCATCAAGA 




TffTI PTT'PP 


P A f2C PPP A PT 


P.PP.PTATPPT 


Ol (!A 


CATCATCGCT 


GCCGGTACTG 


GTGAGTTCGA 


GGCTGGTATC 


TCCAAGGATG 


GCCAGACCCG 


2220 


TGAGCACGCT 


CTGCTCGCCT 


ACACCCTGGG 


TGTCAAGCAG 


CTCATCGTCG 


CCATCAACAA 


22S0 


GATGGACACT 


GCCAACTGGG 


CCGAGGCTCG 


TTACCAGGAA 


ATCATCAAGG 


AGACTTCCAA 


2340 


CTTCATCAAG 


AAGGTCGGCT 


TCAACCCCAA 


GGCCGTTGCT 


TTCGTCCCCA 


TCTCCGGCTT 


2400 


CAACGGTGAC 


AACATGCTCA 


CCCCCTCCAC 


CAACTGCCCC 


TGGTACAAGG 


GCTGGGAGAA 


2460 


GGAGACCAAG 


GCTGGCAAGT 


TCACCGGCAA 


GACCCTCCTT 


GAGGCCATCG 


ACTCCATCGA 


2520 


GCCCCCCAAG 


CGTCCCACGG 


ACAAGCCCCT 


GCGTCTTCCC 


CTCCAGGACG 


TCTACAAGAT 


2580 
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CGGTGGTATC 


GGAACAGTTC 




X A J- 1- la/Hart t. i. 


tala I la i. L. l_ X LA 


AVjLL L Ijtj J.A1 


j£Ol U 


GGTCGTTACC 


TTCGCTCCCT 




UAL. 1 GAAG 1 t, 


AAG 1 LLb 1 L.G 


AGA 1 G LAC LZA 


2 /UU 


.CGAGCAGCTC 


GCTGAGGGCC 


A GC CTl GG XGA 


CAAC GTTG G T 


X 1 CAACGTGA 


A GAACGTTTC 


2760 


CGTCAAGGAA ATCCGCCGTG 


GCAACGTTGC 


CGGTGACTCC 


AAGAACGACC 


CCCCCATGGG 


2820 


CGCCGCTTCT 


TTCACCGCCC 


AGvai t-Ai,t-wl - 


Wii GAACCAC 


CLLGGCCAGG 


T CGGTGCCGG 


2880 


CTACGCCCCC 


GTCCTCGACT 


GCCACACTGC 


CCACATTGCC 


TGCAAGTTCG 


CCGAGCTCCT 


2940 


CGAGAAGATC 


GACCGCCGTA 


CCGGTAAGGC 


TACCGAGTCT 


GCCCCCAAGT 


TCATCAAGTC 


3000 


TGGTGACTCC 


GCCATCGTCA 


AGATGATCCC 


CTCCAAGCCC 


ATGTGCGTTG 


AGGCTTTCAC 


3060 


CGACTACCCT 


CCCCTGGGTC 


GTTTCGCCGT 


CCGTGACATG 


CGCCAGACCG 


TCGCTGTCGG 


3120 


TGTCATCAAG 


GCCGTCGAGA 


AGTCCTCTGC 


CGCCGCCGCN 


AAGGTCACCA 


AGTCCGCTGC 


3180 


CAAGGCCGCC 


AAGAAATAAG 


CGATACCCAT 


CATCAACACC 


TGATG TTCTG 


GGGTCCCTCG 


3240 


TGAGGTTTCT 


CCAGGTGGGC - 


ACCACCATGC 


GCTCACTTCT 


ACGACGAAAC 


GATCAATGTT 


3300 


GCTATGCATG 


AGSACTCGAC 


TATGAATCGA 


GGCACGGTTA 


ATTGAGAGGC 


TGGGAATAAG 


33 6 0 


GGTTCCATCA 


GAACTTCTCT 


GGGAATGCAA 


AACAAAAGGG 


AACAAAAAAA 


CTAGATAGAA 


3420 


GTGAATTCAT 


GACTTCGACA 


ACCAAAAAAA 


AAAAAAAAAA 


A 




3461 



(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 1636 base pairs 

(B) TYPE: nucleic acid 

(C) ETRANDEDNESS : single 
(DJ TOPOLOGY: linear 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 



GGTCTGAAGG 


ACGTGGAATG 


ATGGACTTAA 


TGACAAGAGT 


TGCCTGGCTA 


TTGAGCTCTG 


60 


GTACATGGAT 


CTCGAACTGA 


GAGCGTACAA 


GTTACATGTA 


GTAAATCTAG 


TAGATCT CGC 


12 0 


TGAAAGCCCT 


CTTTCCCGGT 


AGAAACACCA 


CCAGCGTCCC 


GTAGGACAAG 


ATCCTGTCGA 


18 0 


TCTGAGCACA 


TGAATTGCTT 


CCCTGGATCT 


GGCGCTGCAT 


CTGTTTCCCC 


AGACAATGAT 


24 0 


GGTAGCAGCG 


CATGGAAGAA 


CCCGGTTGTT 


CGGAATGTCC 


TTGTGCTAAC 


AGTGGCATGA 


300 


TTTTACGTTG 


CGGCTCATCT 


CGCCTTGGCA 


CCGGACCTCA 


GCaAATCTTG 


TCACAACAGC 


360 


AATCTCAAAC 


AGCCTCATGG 


TTCCCAGATT 


CCCTGATTCA 


GAACTCTAGA 


GCGGCAGATG 


42 0 


TCAAACGATT 


CTGACCTAGT 


ACCTTGAGCA 


TCCCTTTCGG 


ATCCGGCCCA 


TGTTCTGCCT 


480 


GCCCTTCTGA 


GCACAGCAAA 


CAGCCCAAAA 


GGCGCCGGCC 


GATTCCTTTC 


CCGGGATGCT 


54 0 


CCGGAGTGGC 


ACCACCTCCC 


AAAACAAGCA 


ACCTTGAACC 


CCCCCCCCAA 


ATCAACTGAA 


600 


GCGCTCTTCG 


CCTAACCAGC 


ATAAGCCCCC 


CCCAGGATCG 


TTAGGCCAAG 


TGGTAGGGCC 


660 


AGCCAATTAG 


CGAGNGGCCA 


TTTGGAGGTC 


ATGGGCGCAG 


AATGTCCTGA 


CAGTGGTATG 


720 


ATATTGACTG 


CCCGGTGTGT 


GTGGCATCTG 


GCCATAATCG 


CAGGCTGAGG 


CGAGGAAGTC 


780 


TCGTGAGGAT 


GTCCCGACTT 


TGACATCATG 


AGGGAGTGAG 


AAACTGAAGA 


GAAGGAAAGC 


840 


TTCGAAGGTT 


CGATAAGGGA 


TGATTTGCAT 


GGCGGGCGAC 


AGGATGCGAT 


GGCTCGTTGG 


900 


GATACATAAT 


GCTTGGGTTG 


GAAGCGATTC 


CAGGTCGTCT 


TTTTTl'GGTT 


CATCATCACA 


960 
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GCATCAACAA 6 CAACGATAC AAGCAATCCA CTGAGGATTA CCTCTCAACT CAACCACTTT 1020 

CCAAACCATC TCAACTCCCT AAGATTCTTT CAGTGTATTA TCACTAGGAT TTTTCCCAAG 1080 

CCGGCTTCAA AACACACAGA TAAACCACCA ACTCTACAAC CAAAGACTTT TTGATCAATC 114 0 

CAACAACTTC TCTCAACATG TCTGCTGCAA CCGTCACCCG CACTGCAACC GCCGCTGTTC 1200 

GCAGACCCGG CTTCTTCATG CAAGTCCGAC GGATGGGACG CTCATTCGAG CACCAGCCCT 1260 

TTGAGCGACT CTCCGCCACC ATGAAGCCTG CACGACCCGA CTATGCTAAG CAAGTCGTCT 13 20 

GGACGGCTGG CAAGTTTGTC ACTTATGTTC CTCTTTTCGG CGCCATGCTT ACCTGGCCTG 13 8 0 

CGCTCGCCAA STGGGCTCTG GACGGACACA TCGGACGGTG GTAAAAGATC AGACTCTTGT 1440 

CGAGGCAACG GGGAATAGAC AGGACAGCAA AAAAGATATC TCCGGATAGA AGTGTCCATC 15 00 

TTTCGACTTG TATATATATA TATGCTATAC TCTGGGGGCG TTTGGATGGA CTTTGGGCAC 1560 

GAAGCATACT TTGGCGCAAC GCAGATACTT TAATCTGATT CCTTTTGTTA ATTCAAAAAA 1620 

AAAAAAAAAA AAAAAA 1636 
(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2868 base pairs 

(B) TYPE ! nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3 : 



TTTGTATGGC 


TGGATCTCGA 


AAGGCCCTTG 


TCATCGCCAA 


GCGTGGCTAA 


TATCGAATGA 


60 


GGGACACCCA 


CTTGCATATC 


TCCTGATCAT 


TCAAACGACA 


AGTGTGAGGT 


AGGCAATCCT 


120 


CGTATCCCAT 


TGCTGGGCTG 


AAAGCTTCAC 


ACGTATCGCA 


TAAGCGTCTC 


CAACCAGTGC 


180 


TTAGGTGACC 


CTTAAGGATA 


CTTACAGTAA 


GACTGTATTA 


AGTCAGTCAC 


TCTTTCACTC 


24 0 


GGGCTTTGAA 


TACGATCCTC 


AATACTCCCG 


ATAACAGTAA 


GAGGATGATA 


CAGCCTGCAG 


300 


TTGGCAAATG 


TAAGCGTAAT 


TAAACTCAGC 


TGAACGGCCC 


TTGTTGAAAG 


TCTCTCTCGA 


360 


TCAAAGCAAA 


GCTATCCACA 


GACAAGGGTT 


AAGCAGGCTC 


ACTCTTCCTA 


CGCCTTGGAT 


420 


ATGCAGCTTG 


GCCAGCATCG 


CGCATGGCCA 


ATGATGCACC 


CTTCACGGCC 


CAACGGATCT 


480 


CCCGTTAAAC 


TCCCCTGTAA 


CTTGGCATCA 


CTCATCTGTG 


ATCCCAACAG 


ACTGAGTTGG 


54 0 


GGGCTGCGGC 


TGGCGGATGT 


CGGAGCAAAG 


GATCACTTCA 


AGAGCCCAGA 


TCCGGTTGGT 


600 


CCATTGCCAA 


TGGATCTAGA 


TTCGGCACCT 


TGATCTCGAT 


CACTGAGACA 


TGGTGAGTTG 


660 


CCCGGACGCA 


CCACAACTCC 


CCCTGTGTCA 


TTGAGTCCCC 


ATATGCGTCT 


TCTCAGCGTG 


720 


CAACTCTGAG 


ACGGATTAGT 


CCTCACGATG 


AAATTAACTT 


CCAGCTTAAG 


TTCGTAGCCT 


780 


TGAATGAGTG 


AAGAAATTTC 


AAAAACAAAC 


TGAGTAGAGG 


TCTTGAGCAG 


CTGGGGTGGT 


840 


ACGCCCCTCC 


TCGACTCTTG 


GGACATCGTA 


CGGCAGAGAA 


TCAACGGATT 


CACACCTTTG 


900 


GGTCGAGATG 


AGCTGATCTC 


GACAGATACG 


TGCTTCACCA 


CAGCTGCAGC 


TACCTTTGCC 


960 


CAACCATTGC 


GTTCCAGGAT 


CTTGATCTAC 


ATCACCGCAG 


CACCCGAGCC 


AGGACGGAGA 


1020 


GAACAATCCG 


GCCACAGAGC 


AGCACCGCCT 


TCCAACTCTG 


CTCCTGGCAA 


CGTCACACAA 


1080 


CCTGATATTA 


GATATCCACC 


TGGGTGATTG 


CCATTGCAGA 


GAGGTGGCAG 


TTGGTGATAC 


114 0 



WO 94/04673 



-46- 



PCT/FI93/00330 



CGACTGGCCA TGCAAGACGC GGCCGGGCTA GCTGAAATGT CCCCGAGAGG ACAATTGGGA 12 00 

GCGTCTATGA CGGCGTGGAG ACGACGGGAA AGGACTCAGC CGTCATGTTG TGTTGCCAAT 126 0 

TTGAGATTGT TGACCGGGAA AGGGGGGACG AAGAGGATGG CTGGGTGAGG TGGTATTGGG 1320 

AGGATGCATC ATTCGA CTCA GTGAGCGATG TAGAGCTCCA AGAATATAAA TATCCCTTCT 13 8 0 

CTGTCTTCTC .AAAATCTCCT TCCATCTTGT CCTTCATCAG CACCAGAGCC AGCCTGAACA 1440 

CCTCCAGTCA ACTTCCCTTA CCAGTACATC TGAATCAACA TCCATTCTTT GAAATCTCAC 1500 

CACAACCACC ATCTTCTTCA AAATGAAGTT CTTCGCCATC GCCGCTCTCT TTGCCGCCGC 156 0 

TGCCGTTGCC CAGCCTCTCG AGGACCGCAG CAACGGCAAC GGCAATGTTT GCCCTCCCGG 162 0 

CCTCTTCAGC AACCCCCAGT GCTGTGCCAC CCAAGTCCTT GGCCTCATCG GCCTTGACTG 1680 

CAAAGTCCGT AAGTTGAGCC ATAACATAAG AATCCTCTTG ACGGAAATAT GCCTTCTCAC 174 0 

TCCTTTACCC CTGAACAGCC TCCCAGAACG TTTACGACGG CACCGACTTC CGCAACGTCT 1800 

GCGCCAAAAC CGGCGCCCAG CCTCTCTGCT GCGTGGCCCC CGTTGTAAG7 TGATGCCCCA 1860 

GCTCAAGCTC CAGTCTTTGG CAAACCCATT CTGACACCCA GACTGCAGGC CGGCCAGGCT 192 0 

CTTCTGTGCC AGACCGCCGT CGGTGCTTGA GATGCCCGCC CGGGGTCAAG GTGTGCCCGT 19 8 0 

GAGAAAGCCC ACAAAGTGTT GATGAGGACC ATTTCCGGTA CTGGGAAAGT TGGCTCCACG 204 0 

TGTTTGGGCA GGTTTGGGCA AGTTGTGTAG ATATTC CATT CGTACGCCAT TCTTATT CTC 2100 

CAATATTTCA GTACACTTTT CTTCATAAAT CAAAAAGACT GCTATTCTCT TTGTGACATG 216 0 

C CGGAAGGGA acaattgctc ttggtctctg TTATTTGCAA GTAGGAGTGG GAGATTCGCC 222 0 

TTAGAGAAAG TAGAGAAGCT GTGCTTGACC GTGGTGTGAC TCGACGAGGA TGGACTGAGA 228 0 

GTGTTAGGAT TAGGTCGAAC GTTGAAGTGT ATACAGGATC GTCTGGCAAC CCACGGATCC 234 0 

TATGACTTGA TGCAATGGTG AAGATGAATG ACAGTGTAAG AGGAAAAGGA AATGTCCGCC 24 0 0 

TTCAGCTGAT ATCCACGCCA ATGATACAGC GATATACCTC CAATATCTGT GGGAACGAGA 24 6 0 

CATGACATAT TTGTGGGAAC AACTTCAAAC AGCGAGCCAA GACCTCAATA TGCACATCCA 2520 

AAGCCAAACA TTGGCAAGAC GAGAGACAGT CACATTGTCG TCGAAAGATG GCATCGTACC 258 0 

CAAATCATCA GCTCTCATTA TCGCCTAAAC CACAGATTGT TTGCCGTCCC CCAACTCCAA 264 0 

AACGTTACTA CAAAAGACAT GGGCGAATGC AAAGACCTGA AAGCAAACCC TTTTTGCGAC 27 0 0 

TCAATTCCCT CCTTTGTCCT CGGAATGATG ATCCTTCACC AAGTAAAAGA AAAAGAAGAT 2760 

TGAGATAATA CATGAAAAGC ACAACGGAAA CGAAAGAACC AGGAAAAGAA TAAATCTATC 282 0 

ACGCACCTTG TCCCCACACT AAAAGCAACA GGGGGGGTAA AATGAAAT 2868 
(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2175 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS r single 
<D> TOPOLOGY: linear 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO:4: 

AAAAAGCTAG AACGAGACGA TTCCGGCCCG GCAAACCAGG CCGAGTGACG GGAGCATTTC 6 0 
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CATGATTTCA 


CTCGGCAAAC 


TCTGGCTACA 


ATTTTCAGGC 


GGCGAGTTCC 


GATACAAGGG 


12 0 


AAATCTATTA 


CCCACAGACG 


AACGGGAATC 


GGTGATGAGT 


GGTTTCTTGT 


AAGTCAACAT 


180 


TGAGCTAGAT 


AATTCCGGGC 


GAGATCAAGA 


TGCCATACTT 


TGATTGATGA 


AAAATCAATG 


240 


TCAGGCGTAA 


GTCTCTTCAA 


GCTCGCCCAG 


TCCTCTGTAT 


GTAACAGCAA 


TCGCAATTCC 


300 


GAAATGTGCC 


GAGCCAATGG 


AACATGCGTG 


TCTTTCTCTT 


TTCACACACA 


TCCAGTTCGA 


360 


GAGTCTTCTC 


TTCATCGTTT 


CATCGAATCC 


CTTCCCCTCC 


AGCTATTCAC 


CCAGCCGAGC 


420 


CCTTCAGCGC 


ACCAGCGTAT 


GTATGTACCC 


TCGGCTAAGA 


CGCAACAGAA 


GCATCATCAA 


480 


TATACCTGAT 


GTACTACTAT 


CTACTATGAA 


GCCCAAAAAC 


CCCTTCGCAG 


CCCAAATGTA 


54 0 


ACCCAAGCAA 


CGAATCCCCA 


ATAAGAGACA 


ATCCTCAGTG 


ACCCCCAGAA 


GAG CACAGAA 


GOO 


TCGAGCTGGT 


CCTGGTGGGT 


CGCATTGAGA 


CCGGTGGAGA TGCGTTCGAT 


TCGACTGCCG 


660 


GAGCTCCCGG 


GAAGCCGGCA 


GATGGTCCCA 


TGCGATGCCC 


TGCACCGTTT 


TTGTGAATCG 


72 0 


TCGGCATCGC 


GAGAAGTGGC 


CTGCTATGAC 


GTCGCTTGCA 


GCTTGGCCGC 


TCTGTTCGAA 


78 0 


GTTTTTCGAT 


GTTTTTCTTC 


ATGCGGGAGA 


AAGAAAACAT 


CAGATGACAT 


GATTAT CCGA 


84 0 


ATGGATGGCG 


GGAGTTATCG 


TGGTGACGGC 


TGCTTCATGA 


GATGAGTATA 


AATGAGCTTG 


9O0 


TTCGCTCAG C 


GTGTCATGGA 


TCTTGTCCAG 


CTCCAAAGCA 


TCGGCTTCAG 


CATCCATCCG 


960 


CTTGAACAGA 


CAGGCACCAG 


CTTGAATCAG 


AAGCATACCC 


TTGATTTGAT 


ACTCTCTTGG 


1020 


GAAAAAACAC 


CACCATCTGT 


GTAATACTTT 


GATACCCCCA 


AAGCTCAAAC 


GACCGCTTGT 


1080 


ACATACAATA 


ACACCGCCAC 


AATGTTCGCC 


AACTTGACGC 


ACGCTACCCT 


GCGATTCATC 


1140 


GCCTTCTTCA 


ACCACCTGAT 


GATCCTGGCC 


TCATCAGCCA 


TCGTCACCGG 


CCTCGTATCC 


12 0 0 


TGGTTCCTCG 


ACAAGTACGA 


CTACCGCGGC 


GTGAACATTG 


TCTACCAGGA 


AGTCATCGTA 


12 6 0 


TGTCCTCCCA 


AGCACCACAT 


CAAACACACC 


CCATAC CTTG 


GCTCTCCTCA 


GCTCCGTCGA 


1320 


AGCACATAAT 


ACTAACGCAT 


GCAACAACTA 


GGCCACCATA 


ACTCTGGGCT 


TCTGGCTCGT 


1380 


TGGTGCCGTC 


TTGCCCCTCG 


TTGGCAGATA 


CCGCGGCCAC 


CTGGCCCCTC 


TCAACCTCAT 


1440 


CTTCTCCTAC 


CTCTGGCTCA 


CCTCTTTCAT 


CTTCTCCGCG 


CAGGACTGGA 


GCAGCGACAA 


1500 


GTGCAGCTTC 


GGCCAGCCTG 


GCGAGGGCCA 


CTGCAGCCGC 


AAGAAGGCCA 


TTGAATCCTT 


1560 


CAACTTTATC 


GCATTGTAAG 


TGCCTACAAG 


TAATTTGCTA 


TGTATATGGG 


AGAGAGAGAG 


1620 


AAGAAGAAGA 


ATATGGCTCT 


AACATGGCAT 


CTCTACAGCT 


TCTTCCTCCT 


CTGCAACACC 


1680 


CTGGTTGAGA 


TGCTCCTGCT 


CCGCGCCGAG 


TATGCTACCC 


CCGTTGCTGC 


TGCTCACAAC 


174 0 


AAGGAGATTT 


CTGCCGGCCG 


CCCCTCTGAC 


AACTCTGTCT 


AAATAACAAT 


AGACATGCAT 


1800 


AliA lunn^-uu 


A f2H C C li PTT C 

Aun^*l*nt lit 




GCGAGTTCCT 


GATCCGTTGA 




18 6 0 


GACBBBBBCC 


GCGCTCGCAT 


GGTTCATCTG 


CTACAACAAC 


ACAATGACAA 


TCCGAACCAG 


19 2 0 


TCAATAAACC 


TCGACAACAC 


GACGAGTACT 


TTTGCGGATA 


GAAAGATACC 


CATTACACAG 


1980 


GAGATCAAAT 


GGGGAAATTG 


GAAGTGTATG 


GATGGACGCC 


CGTGTATAAT 


GAGGTTGTGA 


2 04 0 


ACGGGATGGG 


AGGCAATGAA 


TAATGGATAA 


TGAGGTAATG 


GATAGATTCG 


GTCGTTTTGA 


2100 


TACCACAGCT 


GCACTCTGCT 


CTACGTCTGT 


CATTAATGAT 


ACATA CAAAT 


GATACCTTAT 


2160 


ACGCTAAAAA 


AAAAA 










2175 
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(2) INFORMATION FOR SEQ ID NO:S: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2737 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 







nrrcinai An 


\»S- X XuXXXXX 


w x ^— x^wx x V— x 


Tr*A A A PTfrr 


c n 


IdflV* X V3X X X I* X 


XXX WUVl\« X X 


XXX V*\3> X 


czcz cz cz r'i"crz 

v<JS3^J\#A X X X w 


R nnn r 1 A TfiT 

UuuV9\#AX w X W 


X tfUWiuu X X 


x ^ u 




uun wv— Win 




zv A a c ft fzn c A 


vil lul l_ Aft X t\ 


CS A TTflft TCZT C 
VaH X X un lulL 


1 R fl 
J.O u 




(aUX J. i 1A1(«1 


I 1 (atalaGUAlala 


1 (JA 1 (si AX la 1 


Ai 1 lAxt-xAI 


Ail 1 LaUAAACj 


ji4 U 


AlliAlUCAl I* 


B /""TI-'B fB ITT* 


*"*f*B /"*i~"T*'T*T 
laUAt-Alala Li X 


l_ 1 (_lsxtaC(jl_x 


ffft TB B B Tr"P 


TGTTGGAGTG 


^ r\ ri 


flVlTIVl fT"V 






■ rirriirsnT 1 
WtaWlAUiW x 


x uLLUiutiub 




■>b U 






/2*P B TPB rz IV Tl^ 


B tj~" b wrrr B 




X X utMAa LLlLu 


A "9 fl 


fTf; fTfT CCZCl 

l» X WV« X wX w\JVJ 


A TfZTr , fZr2f2'T™P 

/I X U X X X 


*r ^"T^* , n , f2T , ^^*T , 

X ^* X ^* X X UX wx 


VJ ^\^VJ\3 X \3r\ X S3 


Tfym a t cz ace 


TRflPP Pfir*A A 


d ft n 

*t O v 




A A A C A Tf3 fTr" 


AAAATfiTARC" 




TTCT end A C A 

X X ^ X ^kUwWI 


n*r"T*ri ffiT A t* r* 

V*. X X *J V^Vj ln^- 




TTGAGAGACA 


AGCAGACTAC 


AGGGATGACG 


AGTAATACGA 


CAGAGCGATA 


CGA CA 


60 0 


ATACGACACA 


GCTAAGAAAA 


TAAAGGTATT 


AGTACTACTA 


ATTGATTACC 


TACTA C CTAG 


660 


ATATATA CTA 


TACCTTATAT 


TTTATATGTG 


TGTGTGTGTG 


TATGTATATG 


CCTTACCTTA 


72 0 


TGCTTCGCAA 


AGAAGAGAAA 


CTAAAACGCC 


TCCTGGCTAC 


CTACCTACCT 


CTACCTTGTA 


780 


AGAGATGGAA 


TAATGTGGCC 


GCGCGTAAAG 


TAGGTACTGG 


ATATACAGGT 


CCTGAACATG 


640 


GCCCTGAATC 


CTGCCAGGCA 


GCCACCTCAC 


CCCTTCCGCA 


GGTATTTATG 


TAGCCCACAG 


900 


CTCCTCCAGA 


GACGATGCCG 


AGATGCCTCA 


TGCAGTCTAC 


CTACAAAGCC 


AG CAGTTTCA 


960 


CGCTTGACTC 


TCACTCTTGA 


TTGAATTCCC 


TCCCTCCCAT 


AATACCAATT 


GGCGTTCAAC 


1020 


UAi XULUAtiL 


AtaAA i tab L. Ua 


LLLAACALCxA 


CQi 1 CGAGGCC 


ATGGCAAAGT 


C CATGTCCGA 


1080 


CTTTTTCAAG 


GACACGGCCC 


AAAAGCAGGA 


CTCGACCAAG 


CATGACTTTG 


TCCAAGCCTC 


114 0 


GCACGGCATC 


ATGAGGGCCA 


TTGTCGAGCC 


GCTCGTCACC 


CAGATGGGCT 


TCCGCGAGAC 


1200 


CCTCACCGAG 


CCCGTCGTCT 


TGCTCGACAG 


CGCGTGCGGA 


GCGGGCGTGC 


TGACGCAGGA 


1260 


GGTGCAGGCG 


GCGCTGCCAA 


AGGAGCTTCT 


GGAGAGGAGC 


TCGTTTACGT 


GTGCGGACAA 


1320 


TGCCGAGGGC 


TTGGTGGACG 


TGGTGAAGAG 


GAGGATTGAT 


GAGGAGAAGT 


GGGTGAATGC 


1380 


AGAGGCCAAG 


GTCCTTGATG 


CCCTGGTGAG 


TATATACATA 


TATATCTATA 


TCTATATAGA 


144 0 


TATATATATG 


CCTTTGACTC 


CCCCCTTTAC 


ATGTCCTACG 


GCTGCTGATT 


GATTGATTGA 


1500 


TGTGGTGATG 


GTGATGTCCC 


AGAACACGGG 


GCTCCCAGAC 


AACTCCTTCA 


CCCATGTGGG 


156Q 


CATTGCCCTG 


GCACTGCACA 


TCATCCCCGA 


TCCAGATGCC 


GTCGTCAAAG 


GTAAACAATC 


1620 


ACCAGCGTCA 


CTGCAAAGAG 


AGATTACGGG 


ATATCATATA 


CTGAAACCAA 


AGCCCAGACT 


1680 


GCATCAGAAT 


GCTCAAGCCA 


GGCGGCATCT 


TTGGCGCATC 


GACATGGCCC 


AAGGCCAGCG 


1740 


CCGA CATGTT 


CTGGATCGCC 


GACATGCGCA 


CCGCCCTGCA 


GTCGCTCCCC 


TTTGACGCGC 


1800 
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CGCTGCCAGA CCCGTTCCCC ATGCAGCTGC ACACCTCGGG CCACTGGGAC GACGCCGCCT I860 

GGGTCGAGAA GCATCTCGTC GAGGATCTGG GGCTGGCCAA CGTCTGTGTG AGGGAGCCGG 192 0 

CGGGCGAGTA CAGCTTTGCG AGCGCGGACG AGTTCATGGC GACGTTTCAG ATGATGCTGC 198 0 

CGTGGATTAT GAAGACGTTT TGGAGCGAGG AGGTGAGGGA GAAGCATTCG GTCGACGAGG 204 0 

TCAAGGAGTT GGTGAAGAGG CATCTGgAGG ACAAGTATGG GGGGAAGGGA TGGACCATTA 2100 

AGTGGCGGGT GATTACCATG ACTGCGACTG CGAGCAAGTG AGGGAGGGCA TCTGCTCATG 2160 

ATTATGTGAC AGCGAGCCAG TAGAGAGCCA TATTGTTGTC TTCAGAATGT GAGGACCGTG 2220 

ATGGTTGGTG TTTGTTGGAG TGATAACTCG TGGGTGTTGC TATTTGCATG TGAGACGATG 2280 

AACCATGCGC ACCAGCCACA ATCACTGTCC CCCACCTTAC CTACCAACTT CAAGTTACCA 2340 

CCTTACCTTT ACCTGATCTA GCACTGTGGC GCAGCTTGGT TTGACTGCTA GGTACCTACC 2400 

TAGTAGTAAT CAGGTACATT CTTCATCCCT GTGTCCTGGT GTCGCAGTTG CAGCTTGTCT 2460 

TATCGCTGTG GCCACGCATC GAGTGGCAGC ATCTTCAACT TCAAGTCCCG TCGGTCGCAC 2520 

TCTGGCCACG TCGCAGATGG ATCGCAGCGG GATCTGAACC GCTCGCTCGG CAACTGATAC 2580 

CAAGTCAACA AACACACGAG ACGACGGGAC GCTGATATAA NKNNGAGGAG GGTAAGAGAA 2640 

CTCTACGAGG GGCGGAAACT TGGTCCGACA ATTTCCCTCC CATCTTCACC CTCGACTCGA 2700 

ACTCGAACTC GATAGCCGCA CCCTCGACCG ATTGCCC 2737 
(2) INFORMATION FOR SEQ ID MO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO.-6: 
ACCGGAATTC ATATCTAGAG GAGCCCGCGA GTTTGGATAC GCC 43 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH t 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi} SEQUENCE DESCRIPTION: SEQ ID NO:7: 
ACCGCCGCGG TTTGACGGTT TGTGTGATGT AGCG 34 
(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCB CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
TCTTCAAGAA TTGCTCGACC AATTCTCACG GTGAATGTAG G 41 
(2). INFORMATION FOR SEQ ID NO:9: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 73 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9 : 

RCACATCTAG AGGTGACCTA GGCATTCTGG CCACTAGATA TATATTTAGA AGGTTCTTGT SO 

AGCTCAAAAG AGC 73 

<2) INFORMATION FOR SEQ ID NO: 10: 

(i> SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 8 base pairs 
(B> TYPE: nucleic acid 
(C) STRANDEDNESS: single 
(D> TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION^ SEQ ID NO:10t 
GGGAATTCTC TAGAAACGCG TTGGCAAATT ACGGTACG 38 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 
GGGAATTCGG TCACCTCTAA ATGTGTAATT TGCCTGCTTG ACC 43 
<2) INFORMATION FOR SEQ ID NO:12: 

<i) SEQUENCE CHARACTERISTICS-. 

(A) LENGTH: 73 base pairs 

(B) TYPE: nucleic acid 

( C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GGGAATTCGG TCACCTCTAA ATGTGTAATT TGCCTGCTTG ACCGATCTAA ACTGTTCGAA SO 

GCCCGAATGT AGG 73 

(2) INFORMATION FOR SEQ ID NO:13 : 

(i) SEQUENCE CHARACTERISTICS: 
(A> LENGTH: 45 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
GGGAATTCTT CTAGATTGCA GAAGCACGGC AAAGCCCACT TACCC 45 
(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single " 
(Dj TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 
TAGCGAATTC TAGGTCACCT CTAAAGGTAC CCTGCAGCTC GAGCTAG 4 7 

(2) INFORMATION FOR SEQ ID NO:15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 baBe pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1S: 
GGGAATTCAT GATGCGCAGT CCGCGG 26 
(2) INFORMATION FOR SEQ ID NO : 16 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1588 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 



CCCCCCTATC 


TTAGTCCTTC 


TTGTTGTCCC 


AAAATGGCGC 


CCTCAGTTAC 


ACTGCCGTTG 


60 


ACCACGGCCA 


TCCTGGCCAT 


TGCCCGGCTC 


GTCGCCGCCC 


AGCAACCGGG 


TACCAGCACC 


120 


CCCGAGGTCC 


ATCCCAAGTT 


GACAACCTAC 


AAGTGTACAA 


AGTCCGGGGG 


GTGCGTGGCC 


180 


CAGGACACCT 


CGGTGGTCCT 


TGACTGGAAC 


TACCGCTGGA 


TGCACGACGC 


AAACTACAAC 


240 


TCGTGCACCG 


TCAACGGCGG 


CGTCAACACC 


ACGCTCTGCC 


CTGACGAGGC 


GACCTGTGGC 


300 


AAGAACTGCT 


TCATCGAGGG 


CGTCGACTAC 


GCCGCCTCGG 


GCGTCACGAC 


CTCGGGCAGC 


360 


AGCCTCACCA 


TGAACCAGTA 


CATGCCCAGC 


AGCTCTGGCG 


GCTACAGCAG 


CGTCTCTCCT 


420 


CGGCTGTATC 


TCCTGGACTC 


TGACGGTGAG 


TACGTGATGC 


TGAAGCTCAA 


CGGCCAGGAG 


480 


CTGAGCTTCG 


ACGTCGACCT 


CTCTGCTCTG 


CCGTGTGGAG 


AGAACGGCTC 


GCTCTACCTG 


54 0 


TCTCAGATGG 


ACGAGAACGG 


GGGCGCCAAC 


CAGTATAACA 


CGGCCGGTGC 


CAACTACGGG 


600 


AGCGGCTACT 


GCGATGCTCA 


GTGCCCCGTC 


CAGACATGGA 


GGAACGGCAC 


CCTCAACACT 


660 


AGCCACCAGG 


GCTTCTGCTG 


CAACGAGATG 


GATATCCTGG 


AGGGCAACTC 


GAGGGCGAAT 


720 


GCCTTGACCC 


CTCACTCTTG 


CACGGCCACG 


GCCTGCGACT 


CTGCCGGTTG 


CGGCTTCAAC 


780 


CCCTATGGCA 


GCGGCTACAA 


AAGCTACTAC 


GGCCCCGGAG 


ATACCGTTGA 


CACCTCCAAG 


840 
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ACCTTCACCA TCATCACCCA GTTCAACACG GACAACGGCT CGCCCTCGGG CAACCTTGTG 900 

AGCATCACCC GCAAGTACCA GCAAAACGGC GTCGACATCC CCAGCGCCCA GCCCGGCGGC 9 60 

GACACCATCT CGTCCTGCCC GTCCGCCTCA " GCCTACGGCG GCCTCGCCAC CATGGGCAAG 1020 

GCCCTGAGCA GCGGCATGGT GCTCGTGTTC AGCATTTGGA ACGACAACAG CCAGTACATG 1080 

AACTGGCTCG- ACAGCGGCAA CGCCGGCCCC TGCAGCAGCA CCGAGGGCAA CCCATCCAAC 114 0 

ATCCTGGCCA ACAACCCCAA CACGCACGTC GTCTTCTCCA ACATCCGCTG GGGAGACATT 1200 

GGGTCTACTA CGAACTCGAC TGCGCCCCCG CCCCCGCCTG CGTCCAGCAC GACGTTTTCG 12 60 

ACTACACGGA GGAGCTCGAC GACTTCGAGC AGCCCGAGCT GCACGCAGAC TCACTGGGGG 1320 

CAGTGCGGTG GCATTGGGTA CAGCGGGTGC AAGACGTGCA CGTCGGGCAC TACGTGCCAG 13 8 0 

TATAGCAACG ACTACTACTC GCAATGCCTT TAGAGCGTTG ACTTGCCTCT GGTCTGTCCA 1440 

GACGGGGGCA CGATAGAATG CGGGCACGCA GGGAGCTCGT AGACATTGGG CTTAATATAT 1500 

AAGACATGCT ATGTTGTATC TACATTAGCA "AATGACAAAC AAATGAAAAA GAACTTATCA 1560 

AGCAAAAAAA AAAAAAAAAA AAAAAAAA 1588 
(2) INFORMATION FOR SEQ ID NO:17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1820 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 











CCGCGGACTG 


CGCATCATGT 


1740 


ATCGGAAGTT 


GGCCGTCATC 


TCGGCCTTCT 


TGGCCACAGC 


TCGTGCTCAG 


TCGGCCTGCA 


1800 


CTCTCCAATC 


GGAGACTCAC 


CCGCCTCTGA 


CATGGCAGAA 


ATGCTCGTCT 


GGTGGCACTT 


1860 


GCACTCAACA 


GACAGGCTCC 


GTGGTCATCG 


ACGCCAACTG 


GCGCTGGACT 


CACGCTACGA 


1920 


ACAGCAGCAC 


GAACTGCTAC 


GATGGCAACA 


CTTGGAGCTC 


GACCCTATGT 


CCTGACAACG 


1580 


AGACCTGCGC 


GAAGAACTGC 


TGTCTGGACG 


GTGCCGCCTA 


CGCGTCCACG 


TACGGAGTTA 


2040 


CCACGAGCGG 


TAACAGCCTC 


TCCATTGGCT 


TTGTCACCCA 


GTCTGCGCAG 


AAGAACGTTG 


2100 


GCGCTCGCCT 


TTACCTTATG 


GGCAGCGACA 


CGACCTACCA 


GGAATTCACC 


CTGCTTGGCA 


2160 


ACGAGTTCTC 


TTTCGATGTT 


GATGTTTCGC 


AGCTGCCGTA 


AGTGACTTAC 


CATGAACCCC 


2220 


TGACGTATCT 


TCTTGTGGGC 


TCCCAGCTGA 


CTGGCCAATT 


TAAGGTGCGG 


CTTGAACGGA 


2280 


GCTCTCTACT 


TCGTGTCCAT 


GGACGCGGAT 


GGTGGCGTGA 


GCAAGTATCC 


CACCAACACC 


234 0 


GCTGGCGCCA 


AGTACGGCAC 


GGGGTACTGT 


GACAGCCAGT 


GTCCCCGCGA 


TCTGAAGTTC 


240 0 


ATCAATGGCC 


AGGCCAACGT 


TGAGGGCTGG 


GAGCCGTCAT 


CCAACAACGC 


AAACACGGGC 


2460 


ATTGGAGGAC 


ACGGAAGCTG 


CTGCTCTGAG 


ATGGATATCT 


GGGAGGCCAA 


CTCCATCTCC 


2520 


GAGGCTCTTA 


CCCCCCACCC 


TTGCACGACT 


GTCGGCCAGG 


AGATCTGCGA 


GGGTGATGGG 


2580 


TGCGGCGGAA 


CTTACTCCGA 


TAACAGATAT 


GGCGGCACTT 


GCGATCCCGA 


TGGCTGCGAC 


2640 


TGGAACCCAT 


ACCGCCTGGG 


CAACACCAGC 


TTCTACGGCC 


CTGGCTCAAG 


CTTTACCCTC 


2700 
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GATACCACCA 


AGAAATU. ulac 


CGTTGTCACC 


CAGTCCGAGA 


CGTCGGGTGC 


CATCAACCGA 


£ 1 o o 


TA CTATGTC C 


AGAA rGCs ULil 


CACTTTCCAG 


CAGCCCAACG 


CCGAGCTTGG 


TAGTTACTCT 




GGCAACGAGC 


TwAACQAl GA 


TTACTGCACA 


GCTGAGGAGG 


CAGAATTCGG 


CGGATCCTCT 


^ q o n 


TTCTCAGACA 


AGGGCGGCCT 


GACTCAGTTC 


AAGAAGGCTA 


CCTCTGGCGG 


CATGGTTCTG 


294v 


GTCATGAGTC 


i (a 1 ubUA 1 l£A 


TGTGAGTTTG ATGGACAAAC 


ATGCGCGTTG ACAAAGAGTC 


i nn n 

jUUU 


AAGCAGCTGA 


CTGAGAItaA 1 


ACAGTACTAC 


GCCAACATGC 


TGTGGCTGGA 


CTCCACCTAC 


i nc n 
j Uo u 


t. t_4sAl~AAA U l-» 


AlaACC 1 1 1- 


CACACCCGGT 


GCCGTGCGCG 


GAAGCTGCTC 


CACCAGCTCC 


.5 X«S "J 


GGTGTCCCTG 


CTCAGGTCGA 


ATCTCAGTCT 


CCCAACGCCA 


AGGTCACCTT 


CTCCAACATC 


3180 


AAGTTCGGAC 


CCATTGGCAG 


CACCGGCAAC 


CCTAGCGGCG 


GCAACCCTCC 


CGGCGGAAAC 


3240 


CCGCCTGGCA 


CCACCACCAC 


CCGCCGCCCA 


GCCACTACCA 


CTGGAAGCTC 


TCCCGGACCT 


3300 


ACCCAGTCTC 


ACTA CGGCCA 


GTGCGGCGGT 


ATTGGCTACA 


GCGGCCCCAC 


GGTCTGCGCC 


3360 


AGCGGCACAA 


CTTGCCAGGT 


CCTGAACCCT 


TACTACTCTC 


AGTGCCTGTA 


AAGCTCCGTG 


3420 


CGAAAGCCTG 


ACGCACCGGT 


AGATTCTTGG 


TGAGCCCGTA 


TCATGACGGC 


GGCGGGAGCT 


3480 


ACATGGCCCC 


GGGTGATTTA 


TTTTTTTTGT 


ATCTACTTCT 


GACCCTTTTC 


AAATATACGG 


3 54 0 



{2} INFORMATION FOR SEQ ID NO.-18: 

{ i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2211 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : Bingle 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



GAATTCTCAC 


GGTGAATGTA 


GGCCTTTTGT 


AGGGTAGGAA 


TTGTCACTCA 


AGCACCCCCA 


60 


ACCTCCATTA 


CGCCTCCCCC 


ATAGAGTTCC 


CAATCAGTGA 


GTCATGGCAC 


TGTTCTCAAA 


120 


TAGATTGGGG 


AGAAGTTGAC 


TTCCGCCCAG 


AGCTGAAGGT 


CGCACAACCG 


CATGATATAG 


18 0 


GGTCGGCAAC 


GGCAAAAAAG 


CACGTGGCTC 


ACCGAAAAGC 


AAGATGTTTG 


CGATCTAACA 


240 


TCCAGGAACC 


TGGATACATC 


CATCATCACG 


CACGACCACT 


TTGATCTGCT 


GGTAAACTCG 


3 00 


TATTCGCCCT 


AAACCGAAGT 


GCGTGGTAAA 


TCTACACGTG 


GGCCCCX'TTC 


GGTATACTGC 


360 


GTGTGTCTTC 


TCTAGGTGCA 


TTCTTTCCTT 


CCTCTAGTGT 


TGAATTGTTT 


GTGTTGGGAG 


420 


TCCGAGCTGT 


AACTACCTCT 


GAATCTCTGG 


AGAATGGTGG 


ACTAACGACT 


ACCGTGCACC 


480 


TGCATCATGT 


ATATAATAGT 


GATCCTGAGA 


AGGGGGGTTT 


GGAGCAATGT 


GGGACTTTGA 


54 D 


TGGTCATCAA 


ACAAAGAACG 


AAGACGCCTC 


TTTTGCAAAG 


TTTTGTTTCG 


GCTACGGTGA 


600 


AGAACTGGAT 


ACTTGTTGTG 


TCTTCTGTGT 


ATTTTTGTGG 


CAACAAGAGG 


CCAGAGACAA 


660 


TCTATTCAAA 


CACCAAGCTT 


GCTCTTTTGA 


GCTACAAGAA 


CCTGTGGGGT ATATATCTAG 


720 


AGTTGTGAAG 


TCGGTAATCC 


CGCTGTATAG 


TAATACGAGT 


CGCATCTAAA 


TACTCCGAAG 


780 


CTGCTGCGAA 


CCCGGAGAAT 


CGAGATGTGC 


TGGAAAGCTT 


CTAGCGAGCG 


GCTAAATTAG 


840 


CATGAAAGGC 


TATGAGAAAT 


TCTGGAGACG 


GCTTGTTGAA 


TCATGGCGTT 


CCATTCTTCG 


900 


ACAAGCAAAG 


CGTTCCGTCG 


CAGTAGCAGG 


CACTCATTCC 


CGAAAAAACT 


CGGAGATTCC 


960 
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TAAGTAGCGA TGGAACCGGA ATAATATAAT AGGCAATACA TTGAGTTGCC TCGACGGTTG 1020 

CAATGCAGGG GTACTGAGCT TGGACATAAC TGTTCCGTAC CCCACCTCTT CTCAACCTTT 1080 

.GGCGTTTCCC TGATTCAGCG TACCCGTACA . AGTCGTAATC ACTATTAACC .CAGACTGACC 1140 

GSACGTGTTT TGCCCTTCAT TTGGAGAAAT AATGTCATTG CGATGTGTAA TTTGCCTGCT 1200 

TGACCGACTG . GGGCTGTTCG AAGCCCGAAT . GTAGGATTGT TATCCGAACT CTGCTCGTAG 12S0 

AGGCATGTTG TGAATCTGTG TCGGGCAGGA CACGCCTCGA AGGTTCACGG CAAGGGAAAC 1320 

CACCGATAGC AGTGTCTAGT AGCAACCTGT AAAGCCGCAA TGCAGCATCA CTGGAAAATA 13 80 

CAAACCAATG GCTAAAAGTA CATAAGTTAA TGCCTAAAGA AGTCATATAC CAGCGGCTAA 1440 

TAATTGTACA ATCAAGTGGC TAAACGTACC GTAATTTGCC AACGCGTTGT GGGGTTGCAG 1500 

AA.GCAACGGC AAAGCCCACT TCCCACGTTT GTTTCTTCAC TCAGTCCAAT CTCAGCTGGT 15S0 

GATCCCCCAA TTGGGTCGCT TGTTTGTTCC GGTGAAGTGA AAGAAGACAG AGGTAAGAAT 1620 

GTCTGACTCG GAGCGTTTTG CATACAACCA AGGGCAGTGA TGGAAGACAG TGAAATGTTG 168 0 

ACATTCAAGG AGTATTTAGC CAGGGATGCT TGAGTGTATC GTGTAAGGAG GTTTGTCTGC 174 0 

CGATACGACG AATA CTGTAT AGTCACTTCT GATGAAGTGG TCCATATTGA AATGTAAGTC 1800 

GGCACTGAAC AGGCAAAAGA TTGAGTTGAA ACTGCCTAAG ATCTCGGGCC CTCGGGCTTC 1850 

GGCTTTGGGT GTACATGTTT GTGCTCCGGG CAAATGCAAA GTGTGGTAGG ATCGACACAC 192 0 

TGCTGCCTTT ACCAAGCAGC TGAGGGTATG TGATAGGCAA ATGTTCAGGG GCCACTGCAT 198 0 

GGTTTCGAAT AGAAAGAGAA GCTTAGCCAA GAACAATAGC CGATAAAGAT AGCCTCATTA 204 0 

AACGAAATGA GCTAGTAGGC AAAGTCAGCG AATGTGTATA TATAAAGGTT CGAGGTCCGT 2100 

GCCTCCCTCA TGCTCTCCCC ATCTACTCAT CAACTCAGAT CCTCCAGGAG ACTTGTACAC 216 0 

CATCTTTTGA GGCACAGAAA CCCAATAGTC AACCGCGGAC TGCGCATCAT G 2211 
(2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 7 base pairs 

(B) TYPE: nucleic acid 
tC) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GAATTCTCAC GGTGAATGTA GGCCTTTTGr AGGGTAGGAA TTGTCACTCA AGCACCCCCA 6 0 

ACCTCCATTA CGCCTCCCCC ATAGAGTTCC CAATCAGTGA GTCATGGCAC TGTTCTCAAA 12 0 

TAGATTGGGG AGAAGTTGAC TTCCGCCCAG AGCTGAAGGT CGCACAACCG CATGATATAG 18 0 

GGTCGGCAAC GGCAAAAAAG CACGTGGCTC ACCGAAAAGC AAGATGTTTG CGATCTAACA 24 0 

TCCAGGAACC TGGATACATC CATCATCACG CACGACCACT TTGATCTGCT GGTAAACTCG 3 00 

TATTCGCCCT AAACCGAAGT GCGTGGTAAA TCTACACGTG GGCCCCTTTC GGTATACTGC 360 

GTGTGTCTTC TCTAGGTGCA TTCTTTCCTT CCTCTAGTGT TGAATTGTTT GTGTTGGGAG 420 

TCCGAGCTGT AACTACCTCT GAATCTCTGG AGAATGGTGG ACTAACGACT ACCGTGCACC 460 

TGCATCATGT ATATAATAGT GATCCTGAGA AGGGGGGTTT GGAGCAATGT GGGACTTTGA 54 0 
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TGGTCATCAA ACAAAGAACG AAGACGCCTC TTTTGCAAAG TTTTGTTTCG GCTACGGTGA 600 

AGAACTGGAT ACTTGTTGTG TCTTCTGTGT ATTTTTGTGG CAACAAGAGG CCAGAGACAA €S0 

TCTATTCAAA CACCAAGCTT GCTCTTTTGA GCTACAAGAA CCTGTGGGGT ATATATCTAG 720 

TGGCCAGAAT GCCTAGGTCA CCTCTAGAGA GTTGAAACTG CCTAAGATCT CGGGCCCTCG 78 0 

GGCTTCGGCT TTGGGTGTAC ATGTTTGTGC TCCGGGCAAA TGCAAAGTGT GGTAGGATCG 840 

ACACACTGCT GCCTTTACCA AGCAGCTGAG GGTATGTGAT AGGCAAATGT TCAGGGGCCA 9O0 

CTGCATGGTT TCGAATAGAA AGAGAAGCTT AGCCAAGAAC AATAGCCGAT AAAGATAGCC 960 

TCATTAAACG AAATGAGCTA GTAGGCAAAG TCAGCGAATG TGTATATATA AAGGTTCGAG 1020 

GTCCGTGCCT CCCTCATGCT CTCCCCATCT ACTCATCAAC TCAGATCCTC CAGGAGACTT 1080 

GTACACCATC TTTTGAGGCA CAGAAACCCA ATAGTCAACC GCGGACTGCG CATCATG 1137 

(2) INFORMATION FOR SEQ ID NOi20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2261 base pairs 
{B) TYPE: nucleic acid 
IC) STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(Xi)' SEQUENCE DESCRIPTION: SEQ ID NO:20: 



GAATTCTCAC 


GGTGAATGTA 


GGCCTTTTGT 


AGGGTAGGAA 


TTGTCACTCA 


AGCACCCCCA 


60 


ACCTCCATTA 


CGCCTCCCCC 


ATAGAGTTCC 


CAATCAGTGA 


GTCATGGCAC 


TGTTCTCAAA 


120 


TAGATTGGGG 


AGAAGTTGAC 


TTCCGCCCAG 


AGCTGAAGGT 


CGCACAACCG 


CATGATATAG 


180 


GGTCGGCAAC 


GGCAAAAAAG 


CACGTGGCTC 


ACCGAAAAGC 


AAGATGTTTG 


CGATCTAACA 


24 0 


TCCAGGAACC 


TGGATACATC 


CATCATCACG 


CACGACCACT 


TTGATCTGCT 


GGTAAACTCG 


300 


TATTCGCCCT 


AAACCGAAGT 


GCGTGGTAAA 


TCTACACGTG 


GGCCCCTTTC 


GGTATACTGC 


360 


GTGTGTCTTC 


TCTAGGTGCA 


TTCTTTCCTT 


CCTCTAGTGT 


TGAATTGTTT 


GTGTTGGGAG 


420 


TCCGAGCTGT 


AACTACCTCT 


GAATCTCTGG 


AGAATGGTGG 


ACTAACGACT 


ACCGTGCACC 


480 


TGCATCATGT 


ATATAATAGT 


GATCCTGAGA 


AGGGGGGTTT 


GGAGCAATGT 


GGGACTTTGA 


54 0 


TGGTCATCAA 


ACAAAGAACG 


AAGACGCCTC 


TTTTGCAAAG 


TTTTGTTTCG 


GCTACGGTGA . 


600 


AGAACTGGAT 


ACTTGTTGTG 


TCTTCTGTGT 


ATTTTTGTGG 


CAACAAGAGG 


CCAGAGACAA 


660 


TCTATTCAAA 


CACCAAGCTT 


GCTCTTTTGA 


GCTACAAGAA 


CCTGTGGGGT 


ATATATCTAG 


720 


TGGCCAGAAT 


GCCTAGGTCA 


CCTCTAAAGG 


TACCCTGCAG 


CTCGAGCTAG AGTTGTGAAG 


7flO 


TCGGTAATCC 


CGCTGTATAG 


TAATACGAGT 


CGCATCTAAA 


TACTCCGAAG 


CTGCTGCGAA 


840 


CCCGGAGAAT 


CGAGATGTGC 


TGGAAAGCTT 


CTAGC GAGCG 


GCTAAATTAG 


CATGAAAGGC 


900 


TATGAGAAAT 


TCTGGAGACG 


GCTTGTTGAA. 


TCATGGCGTT 


CCATTCTTCG 


ACAAGCAAAG 


960 


CGTTCCGTCG 


CAGTAGCAGG 


CACTCATTCC 


CGAAAAAACT 


CGGAGATTCC 


TAAGTAGCGA 


1020 


TGGAACCGGA 


ATAATATAAT 


AGGCAATACA 


TTGAGTTGCC 


TCGACGGTTG 


CAATGCAGGG 


1080 


GTACTGAGCT 


TGGACATAAC 


TGTTCCGTAC 


CCCACCTCTT 


CTCAACCTTT 


GGCGTTTCCC 


1140 


TGATTCAGCG 


TACCCGTACA 


AGTCGTAATC 


ACTATTAACC 


CAGACTGACC 


GGACGTGTTT 


1200 
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TGCCCTTCAT TTGGAGAAAT AATGTCATTG CGATGTGTAA TTTGCCTGCT TGACCGACTG 126 0 

GGGCTGTTCG AAGCCCGAAT GTAGGATTGT TATCCGAACT CTGCTCGTAG AGGCATGTTG 132 0 

TGAATCTGTG TCGGGCAGGA CACGCCTCGA AGGTTCACGG CAAGGGAAAC CACCGATAGC 1380 

AGTGTCTAGT AGCAACCTGT AAAGCCGCAA TGCAGCATCA CTGGAAAATA CAAACCAATG 1440 

GCTAAAAGTA CATAAGTTAA TGCCTAAAGA AGTCATATAC CAGCGGCTAA TAATTGTACA 1500 

ATCAAGTGGC TAAACGTACC GTAATTTGCC AACGCGTTTC TAGATTGCAG AAGCACGGCA 1560 

AAGCCCACTT ACCCACGTTT GTTTCTTCAC TCAGTCCAAT CTCAGCTGGT GATCCCCCAA 1620 

TTGGGTCGCT TGTTTGTTCC GGTGAAGTGA AAGAAGACAG AGGTAAGAAT GTCTGACTCG 1680 

GAGCGTTTTG CATACAACCA AGGGCAGTGA TGGAAGACAG TGAAATGTTG ACATTCAAGG 1740 

AGTATTTAGC CAGGGATGCT TGAGTGTATC GTGTAAGGAG GTTTGTCTGC CGATACGACG 18 00 

AATACTGTAT AGTCACTTCT GATGAAGTGG TCCATATTGA AATGTAAGTC GGCACTGAAC 1860 

AGGCAAAAGA TTGAGTTGAA ACTGCCTAAG ATCTCGGGCC CTCGGGCTTC GGCTTTGGGT 1920 

GTACATGTTT GTGCTCCGGG CAAATGCAAA GTGTGGTAGG ATCGACACAC TGCTGCCTTT 19 8 0 

ACCAAGCAGC TGAGGGTATG TGATAGGCAA ATGTTCAGGG GCCACTGCAT GGTTTCGAAT 2 04 0 

AGAAAGAGAA GCTTAGCCAA GAACAATAGC CGATAAAGAT AGCCTCATTA AACGAAATGA 2100 

GCTAGTAGGC AAAGTCAGCG AATGTGTATA TATAAAGGTT CGAGGTCCGT GCCTCCCTCA 21S0 

TGCTCTCCCC ATCTACTCAT CAACTCAGAT CCTCCAGGAG ACTTGTACAC CATCTTTTGA 2220 

GGCACAGAAA CCCAATAGTC AACCGCGGAC TGCGCATCAT G 2261 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 177S baee pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOUX3Y: linear 



{xi) SEQUENCE DESCRIPTION: SEQ IDN0;21: 



CAATTCTCAC 


GGTGAATGTA 


GGCCTTTTGT 


AGGGTAGGAA 


TTGTCACTCA 


AGCACCCCCA 


60 


AC CTC CATTA 


CGCCTCCCCC 


ATAGAGTTCC 


CAATCAGTGA 


GTCATGGCAC 


TGTTCTCAAA 


12 0 


TAGATTGGGG 


AGAAGTTGAC 


TTCCGCCCAG 


AGCTGAAGGT 


CGCACAACCG 


CATGATATAG 


180 


GGTCGGCAAC 


GGCAAAAAAG 


CACGTGGCTC 


ACCGAAAAGC 


AAGATGTTTG 


CGATCTAACA 


240 


TCCAGGAACC 


TGGATACATC 


CATCATCACG 


CACGACCACT 


TTGATCTGCT 


GGTAAACTCG 


300 


TATTCGCCCT 


AAACCGAAGT 


GCGTGGTAAA 


TCTACACGTG 


GGCCCCTTTC 


GGTATACTGC 


360 


GTGTGTCTTC 


TCTAGGTGCA 


TTCTTTCCTT 


CCTCTAGTGT 


TGAATTGTTT 


GTGTTGGGAG 


420 


TCCGAGCTGT 


AACTACCTCT 


GAATCTCTGG 


AGAATGGTGG 


ACTAACGACT 


ACCGTGCACC 


480 


TGCATCATGT 


ATATAATAGT 


GATCCTGAGA 


AGGGGGGTTT 


GGAGCAATGT 


GGGACTTTGA 


540 


TGGTCATCAA 


ACAAAGAACG 


AAGACGCCTC 


TTTTGCAAAG 


TTTTGTTTCG 


GCTACGGTGA 


600 


AGAACTGGAT 


ACTTGTTGTG 


TCTT CTGTGT 


ATTTTTGTGG 


CAACAAGAGG 


CCAGAGACAA 


660 


TCTATTCAAA 


CACCAAG CTT 


GCTCTTTTGA 


GCTACAAGAA 


CCTTCTAAAT 


ATATATCTAG 


720 
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TGGCCAGAAT GCCTAGGTCA CCTCTAAATG TGTAATTTGC CTGCTTGACC GACTGGGGCT 780 

GTTCGAAGCC CGAATGTAGG ATTGTTATCC GAACTCTGCT CGTAGAGGCA TGTTGTGAAT 840 

CTGTGTCGGG CAGGACACGC CTCGAAGGTT CACGGCAAGG GAAACCACCG ATAGCAGTGT 9 00 

CTAGTAGCAA CCTGTAAAGC CGCAATGCAG CATCACTGGA AAATACAAAC CAATGGCTAA 9 60 

AAGTACATAA GTTAATGCCT AAAGAAGTCA TATACCAGCG GCTAATAATT GTACAATCAA 1020 

GTGGCTAAAC GTACCGTAAT TTGCCAACGC GTTTCTAGAT TGCAGAAGCA CGGCAAAGCC 1080 

CACTTACCCA CGTTTGTTTC TTCACTCAGT CCAATCTCAG CTGGTGATCC CCCAATTGGG 114 0 

TCGCTTGTTT GTTCCGGTGA AGTGAAAGAA GACAGAGGTA AGAATGTCTG ACTCGGAGCG 1200 

TTTTGCATAC AACCAAGGGC AGTGATGGAA GACAGTGAAA TGTTGACATT CAAGGAGTAT 126 0 

TTAGCCAGGG ATGCTTGAGT GTATCGTGTA AGGAGGTTTG TCTGCCGATA CGACGAATAC 1320 

TGTATAGTCA CTTCTGATGA AGTGGTCCAT ATTGAAATGT AAGTCGGCAC TGAACAGGCA 138 0 

AAAGATTGAG TTGAAACTGC CTAAGATCTC GGGCCCTCGG GCTTCGGCTT TGGGTGTACA 144 0 

TGTTTGTGCT CCGGGCAAAT GCAAAGTGTG GTAGGATCGA CACACTGCTG CCTTTACCAA 150 0 

GCAGCTGAGG GTATGTGATA GGCAAATGTT CAGGGGCCAC TGCATGGTTT CGAATAGAAA 156 0 

GAGAAGCTTA GCCAAGAACA ATAGCCGATA AAGATAGCCT CATTAAACGA AATGAGCTAG 1620 

TAGGCAAAGT CAGCGAATGT GTATATATAA AGGTTCGAGG TCCGTGCCTC CCTCATGCTC 168 0 

TCCCCATCTA CTCATCAACT CAGATCCTCC AGGAGACTTG TACACCATCT TTTGAGGCAC 174 0, 

AGAAACCCAA TAGTCAACCG CGGACTGCGC ATCATG 1776 
(2) INFORMATION FOR SEQ ID NO : 22 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH i 1776 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
ID) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 



CAATTCTCAC 


GGTGAATGTA 


GGCCTTTTGT 


AGGGTAGGAA 


TTGTCACTCA 


AGCACCCCCA 


60 


ACCTCCATTA 


CGCCTCCCCC 


ATAGAGTTCC 


CAATCAGTGA 


GTCATGGCAC 


TGTTCTCAAA 


120 


TAGATTGGGG 


AGAAGTTGAC 


TTCCGCCCAG 


AGCTGAAGGT 


CGCACAACCG 


CATGATATAG 


180 


GGTCGGCAAC 


GGCAAAAAAG 


CACGTGGCTC 


ACCGAAAAGC 


AAGATGTTTG 


CGATCTAACA 


240 


TCCAGGAACC 


TGGATACATC 


CATCATCACG 


CACGACCACT 


TTGATCTGCT 


GGTAAACTCG 


300 


TATTCGCCCT 


AAACCGAAGT 


GCGTGGTAAA 


TCTACACGTG 


GGCCCCTTTC 


GGTATACTGC 


360 


GTGTGTCTTC 


TCTAGGTGCA 


TTCTTTCCTT 


CCTCTAGTGT 


TGAATTGTTT 


GTGTTGGGAG 


420 


TCCGAGCTGT 


AACTACCTCT 


GAATCTCTGG 


AGAATGGTGG 


ACTAACGACT 


ACCGTGCACC 


480 


TGCATCATGT 


ATATAATAGT 


GATCCTGAGA 


AGGGGGGTTT 


GGAGCAATGT 


GGGACTTTGA 


54 0 


TGGTCATCAA 


ACAAAGAACG 


AAGACGCCTC 


TTTTGCAAAG 


TTTTGTTTCG 


GCTACGGTGA 


600 


AGAACTGGAT 


ACTTGTTGTG 


TCTTCTGTGT 


ATTTTTGTGG 


CAACAAGAGG 


CCAGAGACAA 


660 


TCTATTCAAA 


CACCAAGCTT 


GCTCTTTTGA 


GCTACAAGAA 


CCTTCTAAAT 


ATATATCTAG 


720 



WO 94/04673 



-58- 



PCT/FI93/OO330 



TGGCCAGAAT GCCTAGGTCA CCTCTAAATG TGTAATTTGC CTGCTTGACC GATCTAAACT 78 0 

GTTCGAAGCC CGAATGTAGG ATTGTTATCC GAACrCTGCT CGTAGAGGCA TGTTGTGAAT 840 

CTGTGTCGGG CAGGACACGC CTCGAAGGTT CACGGCAAGG GAAACCACCG ATAGCAGTGT 9 00 

CTAGTAGCAA CCTGTAAAGC CGCAATGCAG CATCACTGGA AAATACAAAC CAATGGCTAA 9 60 

AAGTACATAA GTTAATGCCT AAAGAAGTCA TATACCAGCG GCTAATAATT GTACAATCAA 1020 

GTGGCTAAAC GTACCGTAAT TTGCCAACGC GTTTCTAGAT TGCAGAAGCA CGGCAAAGCC 1080 

CACTTACCCA CGTTTGTTTC TTCACTCAGT CCAATCTCAG CTGGTGATCC CCCAATTGGG 114 0 

TCGCTTGTTT GTTCCGGTGA AGTGAAAGAA GACAGAGGTA AGAATGTCTG ACTCGGAGCG 1200 

TTTTGCATAC AACCAAGGGC AGTGATGGAA GACAGTGAAA TGTTGACATT CAAGGAGTAT 126 0 

TTAGCCAGGG ATGCTTGAGT GTATCGTGTA AGGAGGTTTG TCTGCCGATA CGACGAATAC 1320 

TGTATAGTCA CTTCTGATGA AGTGGTCCAT ATTGAAATGT AAGTCGGCAC TGAACAGGCA 138 0 

AAAGATTGAG TTGAAACTGC CTAAGATCTC GGGCCCTCGG GCTTCGGCTT -TGGGTGTACA 144 0 

TGTTTGTGCT CCGGGCAAAT GCAAAGTGTG GTAGGATCGA CACACTGCTG CCTTTACCAA 1500 

GCAGCTGAGG GTATGTGATA GGCAAATGTT CAGGGGCCAC TGCATGGTTT CGAATAGAAA 1560 

GAGAAGCTTA GCCAAGAA CA ATAGCCGATA AAGATAGCCT CATTAAACGA AATGAGCTAG 1620 

TAGGCAAAGT CAGCGAATGT GTATATATAA AGGTTCGAGG TCCGTGCCTC CCTCATGCTC 1680 

TCCCCATCTA CTCATCAACT CAGATCCTCC AGGAGACTTG TACACCATCT TTTGAGGCAC 1740 

AGAAACCCAA TAGTCAACCG CGGACTGCGC ATCATG 1776 
(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 745 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

GGACCTACCC AGTCTCACTA CGGCCAGTGC GGCGGTATTG GCTACAGCGG CCCCACGGTC 60 

TGCGCCAGCG GCACAACTTG CCAGGTCCTG AACCCTTACT ACTCTCAGTG CCTGTAAAGC 120 

TCCGTGCGAA AGCCTGACGC ACCGGTAGAT TCTTGGTGAG CCCGTATCAT GACGGCGGCG 180 

GGAGCTACAT GGCCCCGGGT GATTTATTTT TTTTGTATCT ACTTCTGACC CTTTTCAAAT 240 

ATACGGTCAA CTCATCTTTC ACTGGAGATG CGGCCTGCTT GGTATTGCGA TGTTGTCAGC 300 

TTGGCAAATT GTGG CTTTCG AAAACACAAA ACGATTCCTT AGTAGCCATG CATTTTAAGA 360 

TAACGGAATA GAAGAAAGAG GAAATTAAAA AAAAAAAAAA AACAAACATC CCGTTCATAA 420 

CCCGTAGAAT CGCCGCTCTT CGTGTATCCC AGTACCACGT CAAAGGTATT CATGATCGTT 480 

CAATGTTGAT ATTGTTCCGC CAGTATGGCT CCACCCCCAT CTCCGCGAAT CTCCTCTTCT 540 

CGAACGCGGT AGTGGCTGCT GCCAATTGGT AATGACCATA GGGAGACAAA CAGCATAATA 6 00 

GCAACAGTGG AAATTAGTGG CGCAATAATT GAGAACACAG TGAGACCATA GCTGGCGGCC 660 

TGGAAAGCAC rGTTGGAGAC CAACTTGTCC GTTGCGAGGC CAACTTGCAT TGCTGTCAAG 720 
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ACGATGACAA CGTAGCCGAG GACCC . 745 
(2) INFORMATION FOR SEQ ID NO-24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1627 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(D> TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 



GGCGGTATTG 


GCTACAGCGG 


CCCCACGGTC 


TGCGCCAGCG 


GCACAACTTG 


CCAGGTCCTG 


60 


AACCCTTACT 


ACTCTCAGTG 


CCTGTAAAGC 


TCCGTGCGAA 


AGCCTGACGC 


ACCGGTAGAT 


120 


TCTTGGTGAG 


CCCGTATCAT 


GACGGCGGCG 


GGAGCTACAT 


GGCCCCGGGT 


GATTTATTTT 


180 


TTTTGTAT CT 


ACTTCTGACC 


CTTTTCAAAT 


ATACGGTCAA 


CTCATCTTTC 


ACTGGAGATG 


240 


CGGCCTGCTT 


GGTATTGCGA 


TGTTGTCAGC 


TTGGCAAATT 


GTGGCTTTCG 


AAAACACAAA 


300 


ACGATTCCTT 


AGTAGCCATG 


CATCGGGATC 


CTTTAAGATA 


ACGGAATAGA 


AGAAAGAGGA 


360 


AATTAAAAAA 


AAAAAAAAAA 


CAAACATCCC 


GTTCATAACC 


CGTAGAATCG 


CCGCTCTTCG 


420 


TGTATCCCAG 


TACCACGGCA 


AAGGTATTTC 


ATGATCGTTC 


AATGTTGATA 


TTGTTCCCGC 


480 


CAGTATGGCT 


GCACCCCCAT 


CTCCGCGAAT 


CTCCTCTTCT 


CGAACG CGGT 


AGTGGCGCGC 


54 0 


CAATTGGTAA 


TGACCATAGG 


GAG A C AAA CA 


GCATAATAGC 


AACAGTGGAA 


ATTAGTGGC G 


600 


CAA1 AAllGA 




AGA C CAT AG C 


TGG C GGCCTG 


GAAAGCACTG 


TTGGAGACCA 


660 


AL I ICal L.L.G1 


I GLGA(jGLI_A 


A CTTGCATTG 


CTGT CAAGA C 


GATGAUAACG 


TAGCCGAGGA 


720 
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CTAGCCGCAG 


CTCACCGTAC 


CAGTATCGAG 


GATTGACGGC 


AGAATAGCAG 


TGGCTCTCCA 


900 


GGATTTGACT 


GGACAAAATC 


TTCCAGTATT 


CCCAGGTCAC 


AGTGTCTGGC 


AGAAGTCCCT 


960 


TCTCGCGTGC 


ANTCGAAAGT 


CGCTATAGTG 


CGCAATGAGA 


GCACAGTAGG 


AGAATAGGAA 


1020 


CCCGCGAGCA 


CATTGTTCAA 


TCTCCACATG 


AATTGGATGA 


CTGCTGGGCA 


GAATGTGCTG 


1080 


CCTCCAAAAT 


CCTGCGTCCA 


ACAGATACTC 


TGGCAGGGGC 


TTCAGATGAA 


TGCCTCTGGG 


114 0 


CCCCCAGATA 


AGATGCAGCT 


CTGGATTCT C 


GGTTACNATG 


ATATCGCGAG 


AGAGCACGAG 


1200 


TTGGTGATGG 


AGGGACAGGA 


GGCATAGGTC 


GCGCAGGCCC 


ATAACCAGTC 


TTGCACAGCA 


1260 


TTGATCTTAC 


CTCACGAGGA 


GCTCCTGATG 


CAGAAACTCC 


TCCATGTTGC 


TGATTGGGTT 


1320 


GAGAATTTCA 


TCGCTCCTGG 


ATCGTATGGT 


TGCTGGCAAG 


ACCCTGCTTA 


ACCGTGCCGT 


1380 


GTCATGGTCA 


TCTCTGGTGG 


CTTCGTCGCT 


GGCCTGTCTT 


TGCAATTCGA 


CAGCAAATGG 


144 0 


TGGAGATCTC 


TCTATCGTGA 


CAGTCATGGT 


AGCGATAGCT 


AGGTGTCGTT 


GCACGCACAT 


1500 


AGGCCGAAAT 


GCGAAGTGGA 


AAGAATTTCC 


CGGNTGCGGA 


ATGAAGTCTC 


GTCATTTTGT 


15S0 


ACTCGTACTC 


GACACCTCCA 


CCGAAGTGTT 


AATAATGGAT 


CCACGATGCC 


AAAAAGCTTG 


1620 


TGCATGC 












1627 



(2) INFORMATION FOR SEQ ID NO:25: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

GGA CTGGCAT CATGGCGCCC TCAGTTACAC TGCCGTTGAC CACGGCCATC CTGGCCATTG 6 0 

CCCGGCTCGT CGCCGCCCAG CAACCGGGTA C 91 
(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 9 7 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(ix) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 18 . .95 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

AACCGCGGAC TGGCATC ATG GCG CCC TCA GTT ACA CTG CCG TTG ACC ACG 50 
Met Ala Pro Ser Val Thr Leu Pro Leu Thr Thr 
15 10 

GCC ATC CTG GCC ATT GCC CGG CTC GTC GCC GCC CAG CAA CCG GGT 9 5 

Ala lie Leu Ala lie Ala Arg Leu Val Ala Ala Gin Gin Pro Gly 
15 20 25 

AC 9 7 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

Met Ala Pro Ser Val Thr Leu Pro Leu Thr Thr Ala He Leu Ala He 
15 10 15 

Ala Arg Leu Val Ala Ala Gin Gin Pro Gly 

20 25 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION ; SEQ ID NO; 28: 



ACT ACG TAG TCG ACT 



15 
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WHAT IS CLAIMED IS: 

1. A method for cloning a promoter that is active in a desired 
environmental condition, said method comprising: 

a. exposing a host to said environmental condition; 
5 b. extracting mRNA from said host; 

c. preparing a cDNA bank from said mRNA; 

d. detectably labelling a sample of said cDNA; 

e. hybridizing said labelled labelled cDNA to said cDNA 
bank; 

10 f. selecting clones from said hybridization of step (e) on 

the basis of the intensity of the hybridization; 

g. determining the relative abundancy of said selected 
clones in the cDNA bank of step (c); 

h. identifying the most abundant clones of step (g); and 
15 i. using the inserts of the clones of step (h) to identify and 

clone the host promoter that was responsible for 
expression of the corresponding mRNA under said 
environmental condition. 

2. The method of claim 1, wherein said condition is growth in 
20 glucose-containing medium. 



3. 



The method of claim 1, wherein the host is a filamentous fungi. 
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The method of claim 1 , wherein the host is selected from the 
group consisting of Trichoderma, Aspergillus, Claviceps 
purpurea, Penicillium chrysogenum, Magnaporthe grisea, 
Neurospora, Mycosphaerella spp. , Collectotrichum trifolii r the 
dimorphic fungus Histoplasmia capsulatum, Nectia 
haematococca (anamorphiFusarium solani f. sp. phaseoli and 
f . sp . pisi) , Ustitago violacea , Ustilago maydis, 
Cephalosporium acremonium, Schizophyllum commune, 
Podospora anserine., Sordaria macrospora^ Mucor 
circinelloides, and Collectotrichum capsici. 

The method of claim 4, wherein the host is Trichoderma. 

The method of claim 5, wherein the host is T. reesei. 

An isolated promoter capable of expression of an operably- 
I inked coding sequence in a fungal host grown on glucose. 

The promoter of claim 7, wherein said promoter is cloned by 
a method comprising: 

a. exposing a host to said environmental condition; 

b. extracting mRNA from said host; 

c. preparing a cDNA bank from a first sample of said 
mRNA; 

d. detectably labelling a sample of said cDNA; 

e. hybridizing said labelled labelled cDNA to said cDNA 
bank; 

f. selecting clones from said hybridization of step (e) on 
the basis of the intensity of the hybridization; 

g. determining the relative abundancy of said selected 
clones in the cDNA bank of step (c); 



t' ' 



WO 94/04673 PCT/FI93/00330 

-63- 

h. identifying the most abundant clones of step (g); and 

i. using the inserts of the clones of step (h) to identify and 
clone the host promoter that was responsible for 
expression of the corresponding mRNA under said 

5 environmental condition. 

9. The promoter of claim 7, wherein said host is a filamentous 
fungi. 

10. The promoter of claim 9, wherein said host is selected from the 
group consisting of Trichoderma, Aspergillus, Claviceps 

10 purpurea PemcilHum chrysogenum, Magnaporthe grisea, 

Neurospora, Mycosphaerella spp. , Collectotrichum trifolii, the 
dimorphic fungus Histoplasmia capsulation, Nectia 
haematococca (axamoTphiFusarium solani f. sp. phaseoli and 
f. sp. pisi), Ustilago violacea, Ustilago maydis, 

15 Cephalosporium acremonium, Schizopftyllum commune, 

Podospora anserina, Sordaria macrospora, Mucor 
circinelloides, and Collectotrichum capsici. 

11. The promoter of claim 10, wherein said host is Trichoderma. 

12. The promoter of claim 11, wherein said host is selected from 
20 the group consisting of T. reesei % T. harrianum, 

T. longibrachiatum, T. viride, and T. koningii. 

13. The promoter of claim 12, wherein said host is T. reesei. 
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The promoter of claim 13, wherein said promoter is the tefl 
promoter. 
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15. The promoter of claim 14, wherein said tefl promoter contains 
promoter elements of the 1.2 kb sequence adjacent to the 
translauonal start site of SEQ ID 1. 

16. The promoter of claim 13, wherein said promoter is the 
5 promoter of SEQ ID 2. 

17. The promoter of claim 13, wherein said promoter is the 
promoter of SEQ ID 3. 

18. The promoter of claim 13, wherein said promoter is the 
promoter of SEQ ID 4. 

10 19. The promoter of claim 13, wherein said promoter is the 

promoter of SEQ ID 5. 

20. The promoter of claim 13, wherein said promoter is the 
promoter of SEQ ID 6. 

21. The promoter of claim 7, wherein said promoter is an altered 
15 cbhl promoter, such alteration decreasing the ability of glucose 

to repress said cbhl promoter. 

22. The promoter of claim 21, wherein said native cbhl promoter 
has an altered mig-like sequence at approximately position -720 
to -715. 

20 23. The promoter of claim 22, wherein said mig-like sequence is 

5'-GTGGGG. 
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24. The promoter of claim 22, wherein said altered mig-Iike 
sequence 5'-TCTAGA. 

25. The promoter of claim 24, wherein said promoter is the cbhl 
promoter of pMI-24. 

5 26. The promoter of claim 21 , wherein said native cbhl promoter 

has the sequence TCTAAA at position -1505 to -1500 and the 
sequence TCTAGA at position -720 to -715. 

27. The promoter of claim 22, wherein said native cbhl promoter 
has the sequence TCTAAA at position -1505 to -1500 and the 

10 sequence TCTAAA at position -1001 to -996 and the sequence 

TCTAGA at position -720 to -715. 

28. A promoter, wherein said promoter is selected from the cbhl 
promoter of the group consistin of pML016del5(ll), pMI-24, 
pMI-27, pMI-28, pMLO!6deI5(ll), SEQ ID 19, SEQ ID 20, 

15 SEQ ID 21 and SEQ ID 22. 

29. A vector comprising the promoter of claim 7. 

30. The vector of claim 29, wherein said promoter is operably 
linked to a coding sequence. 

31- The vector of claim 30, wherein said coding sequence encodes 
20 an enzyme hydrolysing lignocellulose. 

32. A host cell transformed with the vector of claim 31. 
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33. The vector of claim 32, wherein said vector is selected from 
the group consisting of pTHNlOOB, pML016del5(ll), pMI-24, 
pMI-27, pMI-28. 

34. A host cell transformed with the vector of claim 33. 
5 35. A host cell transformed with the vector of claim 30. 

36. The host cell of claim 35, wherein said cell is a fungal cell. 

37. The host cell of claim 36, wherein said fungal cell is that of a 
fungus selected from the group consisting of Trichoderma y 
Aspergillus^ Claviceps purpurea, Pemcillium chrysogeman, 

10 Magnaporthe grisea, Neurospora, Mycospkaerella spp. , 

Collectotrichum trifolii, the dimorphic fungus Histoplasmia 
capsulatum, Nectia haematococca (ariamorph:Fo?arium solani 
f. sp. phaseoli and f. sp. pisi), Ustilago violacea, Ustilago 
maydis , Cephalosporium acremonium, ScJuzophyllum 

15 commune, Podospora anserina, Sordaria macrospora, Mucor 

circinelloides, and Collectotrichum capsici. 

38. The host cell of claim 37, wherein said fungus is Trichoderma. 

39. The host cell of claim 38, wherein said fungus is selected from 
the group consisting of T. reesei, T. harrianum, 

20 T. longibrachiatum, T. viride, and T. koningii. 



40. 



The host cell of claim 39, wherein said fungus is T. reesei. 
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An enzyme composition produced by a method comprising: 

a. growing the host cell of claim 35 in the presence of 
glucose; 

b. separating the host cell from the growth medium; and 

c. using said growth medium of step (b) as the source of 
the enzymes in said enzyme composition. 
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1 GGTCTGAAGG ACGTGGAATG 

21 ATGGACTTAA TGACAAGAGT TGCCTGGCTA TTGAGCTCTG GTACATGGAT CTCGAACTGA 
81 GAGCGTACAA GTTACATGTA GTAAATCTAG TAGATCTCGC TGAAAGCCCT CTTTCCCGGT 
141 AGAAACACCA CCAGCGTCCC GTAGGACAAG ATCCTGTCGA TCTGAGCACA TGAATTGCTT 
201 CCCTGGATCT GGCGCTGCAT CTGTTTCCCC AGACAATGAT GGTAGCAGCG CATGGAAGAA 
261 CCCGGTTGTT CGGAATGTCC TTGTGCTAAC AGTGGCAIGA TTTTACGTTG CGGCTCATCT 
321 CGCCTTGGCA CCGGACCTCA GCAAATCTTG TCACAACAGC AATCTCAAAC AGCCTCATGG 
381 TTCCCAGATT CCCTGATTCA GAACTCTAGA GCGGCAGATG TCAAACGATT CTGACCTAGT 
441 ACCTTGAGCA TCCCTTTCGG ATCCGGCCCA TGTTCTGCCT GCCCTTCTGA GCACAGCAAA 
501 CAGCCCAAAA GGCGCCGGCC GATTCCTTTC CCGGGATGCT CCGGAGTGGC ACCACCfCCC 
561 AAAACAAGCA ACCTTGAACC CCCCCCCCAA ATCAACTGAA GCGCTCTTCG CCTAACCAGC 
621 ATAAGCCCCC CCCAGGATCG TTAGGCCAAG TGGTAGGGCC AGCCAATTAG~CGAGNGGCCA 
681 TTTGGAGG1"C ATGGGCGCAG AATGTCCTGA CAGTGGTATG ATATTGACTG CCCGGTGTGT 
741 GTGGCATCTG GCCATAATCG CAGGCTGAGG CGAGGAAGTC TCGTGAGGAT GTCCCGACTT 
801 TGACATCATG AGGGAGTGAG AAACTGAAGA GAAGGAAAGC TTCGAAGGTT CGATAAGGGA 
861 TGATTTGCAT GGCGGGCGAC AGGATGCGAT GGCTCGTTGG GATACATAAT GCTrGGGTTG 
921 GAAGCGATTC CAGGTCGTCT TTTTTTGGTT CATCATCACA GCATCAACAA GCAACGATAC 
981 AAGCAATCCA CTGAGGATTA CCTCTCAACT CAACCACTTT CCAAACCATC TCAACTCCCT 
1041 AAGATTCTTT CAGTGTATTA TCACTAGGAT TTTTCCCAAG CCGGCTTCAA AACACACAGA 
1101 TAAACCACCA ACTCTACAAC CAAAGACTTT TTGATCAATC CAACAACTTC TCTCAACATG 
1161 TCTGCTGCAA CCGTCACCCG CACTGCAACC GCCGCTGTTC GCAGACCCGG CTTCTTCAIG 
1221 CAAGTCCGAC GGATGGGACG CTCATTCGAG CACCAGCCCT TTGAGCGACT CTCCGCCACC 
1281 ATGAAGCCTG CACGACCCGA CTATGCTAAG CAAGTCGTCT GGACGGCTGG CAAGTTTGTC 
1341 ACTTATGTTC CTC7TTTCGG CGCCATGCTT ACCTGGCCTG CGCTCGCCAA STGGGCTCTG 
1401 GACGGACACA TCGGACGGTG GTAAAAGATC AGACTCTTGT CGAGGCAACG GGGAATAGAC 
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1461 AGGACAGCAA AAAAGATATC TCCGGATAGA AGTGTCCATC TTTCGACTTG TATATATATA 
1521 TATGCTATAC TCTGGGGGCG TTTGGATGGA CTTTGGGCAC GAAGCATACT TTGGCGCAAC 
1581 GCAGATACTT TAATCTGATT CCTTTTGTTA ATTCAAAAAA AAAAAAAAAA AAAAAA 

FIG.3A(Cont.) 
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10 20 30 40 50 60 

II till 

1 TTTGTATGGC TGGATCTCGA MGGCCCTTG TCATCGCCAA GCGTGGC7AA TA7CGAATGA 

61 GGGACACCGA CTTGCATATC TCCTGATCAT TCAAACGACA AGTGTGAGGT AGGCAATCCT 

121 CGTATCCCAT TGCTGGGCTG AAAGCTTCAC ACGTATCGCA TAAGCGTCTC CAACCAGTGC 

181 TTAGGTGACC CTTAAGGATA CTTACAGTAA GACTGTATTA AGTCAGTCAC TCTTTCACTC 

241 GGGCTTTGAA TACGATCCTC AATACTCCCG ATAACAGTAA GAGGATGATA CAGCCTGCAG 

301 TTGGCAAATG TAAGCGTAAT TAAACTCAGC TGAACCGCCC TTGTTGAAAG TCTCTCTCGA 

361 TCAAAGCAAA GCTATCCACA GACAAGGGTT AAGCAGGCTC ACTCTTCCTA CGCCTTGGAT 

421 ATGCAGCTTG GCCAGCATCG CGCATGGCCA ATGATGCACC CTTCACGGCC CAACGGATCT 

481 CCCGTTAAAC TCCCCTGTAA CTTGGCATCA CTCATCTGTG ATCCCAACAG ACTGAGTTGG 

541 GGGCTGCGGC TGGCGGATGT CGGAGCAAAG GATCACTTCA AGAGCCCAGA TCCGGTTGGT 

601 CCATTGCCAA TGGATCTAGA TTCGGCACCT TGATCTCGAT CACTGACACA TGGTGAGTTG 

661 CCCGGACGCA CCACAAGTCC CCCTGTGTCA TTGAGTCCCC ATATGCGTCT TCTCAGCGTG 

721 CAACTCTGAG ACGGATTAGT CCTCACGATG AAATTAACTT CCAGCTTAAG TTCGTAGCCT 

.781 TGAATGAGTG AAGAAATTTC AAAAACAAAC TGAGTAGAGG TCTTGAGCAG CTGGGGTGGT 

841 ACGCCCCTCC TCGACTCTTG GGACATCGTA CGGCAGAGAA TCAACGGATT_CACACCTTTG 

901 GGTCGAGATG AGCTGATCTC GACAGATACG TGCTTCACCA CAGCTGCAGC TACCTTTGCC 

961 CAACCATTGC GTTCCAGGAT CTTGATCTAC ATCACCGCAG CACCCGAGCC AGGACGGAGA 

1021 GAACAATCCG GCCACAGAGC AGCACCGCCT TCCAACTCTG CTCCTGGCAA CGTCACACAA 

1081 CCTGATATTA GATATCCACC 7GGGTGATTG CCATTGCAGA GAGGTGGCAG TTGGTGA7AC 

1141 CGACTGGCCA TGCAAGACGC GGCCGGGCTA GCT6AAATGT CCCCGAGAGG ACAATTGGGA 

1201 GCGTCTATGA CGGCGTGGAG ACGACGGGAA AGGACTCAGC CGTCATGTTG TGTTGCCAAT 

1261 TTGAGATTGT TGACCGGGAA AGGGGGGACG AAGAGGATGG CTGGGTGAGG TGGTATTGGG 

1321 AGGATGCATC ATTCGACTCA GTGAGCGATG TAGAGCTCCA AGAATATAAA TATCCCTTCT 

1381 CTGTCTTCTC AAAATCTCCT TCCATCTTGT CCTTCA7CAG CACCAGAGCC AGCCTGAACA 

1441 CCTCCAGTCA ACTTCCCTTA CCAGTACATC TGAATCAACA TCCATTCTTT GAAATCTCAC 

1501 CACAACCACC ATCTTCT7CA AAA7GAAG77 C77CGCCA7C GCCGC7CTCT 77GCCGCCGC 

1561 7GCCG77GCC CAGCCTC7CG AGGACCGCAG CAACGGCAAC GGCAATGTTT GCCC7CCCGG 

1621 CCTC7TCAGC AACCCCCAG7 GCTG7GCCAC CCAAGTCC7T GGCC7CA7CG GCC7TGAC7G 

1681 CAAAG7CCG7 AAG77GAGCC A7AACA7AAG AA7CC7C77G ACGGAAA7AT GCCT7CTCAC 

1741 TCCTTTACCC C7GAACAGCC 7CCCAGAACG TT7ACGACGG CACCGAC77C CGCAACGTCT 

1801 GCGCCAAAAC CGGCGCCCAG CCICTCTGCT GCG7GGCCCC CGTTGTAAGT TGATGCCCCA 

1861 GC7CAAGC7C CAG7CTT7GG CAAACCCATT CTGACACCCA GAC7GCAGGC CGGCCAGGCT 
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1921 CTTCTGTGCC AGACCGCCGT CGGTGCTTGA GATGCCCGCC CGGGGTCAAG GTGTGCCCGT 

19B1 GAGAAAGCCC ACAAAGTGTT GATGAGGACC ATTTCCGGTA CTGGGAAAGT TGGCTCCACG 

2041 TGTTTGGGCA GGTTTGGGCA AGTTGTGTAG ATATTCCATT CGTACGCCAT TCTTATTCTC 

2101 CAATATTTCA GTACACTTTT CTTCAtAAAT CAAAAAGACT GCTATTCTCT TTGTGACATG 

2161 CCGGAAGGGA ACAATTGCTC TTGGTCTCTG TTATTTGCAA GTAGGAGTGG GAGATTCGCC 

2221 TTAGAGAAAG TAGAGAAGCT GTGCTTGACC GTGGTGTGAC TCGACGAGGA TGGACTGAGA 

2281 GTGTTAGGAT TAGGTCGAAC GTTGAAGTGT ATACAGGATC GTCTGGCAAC CCACGGATCC 

2341 TATGACTTGA TGCAATGGTG AAGA7GAATG ACAGTGTAAG AGGAAAAGGA AATGTCCGCC 

2401 T7CAGCTGAT ATCCACGCCA ATGATACAGC GATATACCTC CAATATCTGT GGGAACGAGA 

2461 CATGACATAT TTGTGGGAAC AACTTCAAAC AGCGAGCCAA GACCTCAATA TGCACATCCA 

2521 AAGCCAAACA TTGGCAAGAC GAGAGACAGT CACATTGTCG TCGAAAGATG GCATCGTACC 

25B1 CAAATCATCA GCTCTCATTA TCGCCTAAAC CACAGATTGT TTGCCGTCCC CCAACTCCAA 

2641 AACGTTACTA CAAAAGACAT GGGCGAATGC AAAGACCTGA AAGCAAACCC TTTTTGCGAC 

2701 TCAATTCCCT CCTTTGTCCT CGGAATGATG ATCCTTCACC AAGTAAAAGA AAAAGAAGAT 

2761 TGAGATAATA CATGAAAAGC ACAACGGAAA CGAAAGAACC AGGAAAAGAA TAAATCTATC 

2821 ACGCACCTTG TCCCCACACT AAAAGCAACA GGGGGGGTAA AATGAAAT 

FIG.4A(Cont.) 
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TGTCCTCCCA 
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CAAACACACC CCATACCTTG GCTC7CCTCA 


GC7CCG7CGA 


1321 
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ACTAACGCAT 


GCAACAAC7A GGCCACCATA ACTCTGGGCT 


TC7GGC7CG7 


138L 
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I44L CTTCTCCTAC CTCTGGCTCA CCTCTTTCAT CTTCTDCGCG CAGGACTGGA GCAGCGACAA 
1501 GTGCAGCTTC GGCCAGCCTG GCGAGGGCCA CTGCAGCCGC AAGAAGGCCA TTGAATCCTT 
1561 CAACTTTATC GCATTGTAAG TGCCTACAAG TAATTTGCTA TGTATA7GGG AGAGAGAGAG 
1621 AAGAAGAAGA ATATGGCTCT AACATGGCAT CTCTACAGCT TCTICC7CCT CTGCAACACC 
16B1 CTGGTTGAGA TGCTCCTGCT CCGCGCCGAG TATGCTACCC CCGTTGCTGC TGCTCACAAC 
1741 AAGGAGATTT CTGCCGGCCG CCCCTCTGAC AACTCTGTCT AAATAACAAT AGACATGCAT 
1801 AGATGAACGG AGACCACTTC TACTTTCTTT GCGAGT7CCT GATCCGTTGA CCTGCAGG7C 
1861 GACBBBBBCC GCGCTCGCAT GGTTCATCTG CTACAACAAC ACAATGACAA TCCGMCCAG 
1921 TCAATAAACC TCGACAACAC GACGAGTACT TTTGCGGATA GAAAGATACC CATTACACAG 
1981 GAGATCAAAT GGGGAAATTG GAAGTGTA7G GATGGAC6CC CGTGTATAAT GAGGTTGTGA 
2041 ACGGGATGGG AGGCAATGAA TAATGGATAA TGAGGTAATG GATAGATTCG GTCGTTTTGA 
2101 TACCACAGCT GCACTCTGCT CTACGTCTGT CATTAATGAT ACA7ACAAAT GATACCTTAT 
2161 ACGCTAAAAA AAAAA 
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TT7CAAAC77 GGGGTTTCGT 
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GGACAGCCAA AGCC7CAC7A 
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GGTTTrATGT 77GGGGGAGG 


241 


A7GA7CCA7G 


AG7CAGAC77 GCACAGGTTT 


301 


CGGG7GAGGT 


GGTGGATGGC ATTCAACCCA 


361 


AGCGA777G7 


ncccncGA gtattagatg 


421 


CTGCTCTCGG 


ATGTCGGGTT TCTCTTGTGT 


481 


AGAGAGCGAA 


AAACATGCTC AAAATGTAGC 


541 


TTGAGAGACA 


AGCAGACTAC AGGGA7GACG 
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ATACGACACA 


GCTAAGAAAA TAAAGGTATT 


661 


ATATATACTA 


TACCTTATAT 777A7A7G7G 


781 
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AGAAGAGAAA C7AAAACGCC 


781 


AGAGATGGAA 


TAATGFGGCC GCGCG7AAAG 
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GCCCTGAATC 


CTGCCAGGCA GCCACCTCAC 
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CTCCTCCAGA 


GACGATGCCG AGATGCCTCA 
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CGCTTGACTC 


TCACTC1TGA T7GAATTCCC 


1021 


GMTGCCAGC 


AGAATGGCCG CCCAACACGA 


1081 


CTTTTTCAAG 


GACACGGCCC AAAAGCAGGA 


1141 


GCACGGCA7C 


ATGAGGGCCA TTGTCGAGCC 


1201 


CCTCACCGAG 


CCCGTCGTCT TGCTCGACAG 


1261 


GGTGCAGGCG 


GCGCTGCCAA AGGAGCTTCT 


1321 


7GCCGAGGGC 


TTGGTGGACG TGGTGAAGAG 


1381 


AGAGGCCAAG 


GTCCTTGATG CCCTGGTGAG 


1441 


TATATATATG 


CCTTTGACTC CCCCCTTTAC 



FIG. 
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GCTTGTTTTT CTCTCCTTCT TCAAACTGGC 
GGGGC7T7TG GGGGCAfGTC TGCCAGGTCT 
CAAACAGGCA GTTGTCAATA GATTGATGTC 
TCATGTATGT ATTTATQTAT A7TTGCAAAG 
CTCG7GCGCT GGATAAATC7 7G7TGGAG7G 
CAGCAACAC7 TGCCCAGGGG GATG7AC7GC 
ATGA7GCCGA ACAGACAAAT 77GAGCC7CG 
GCCGG7GA7G TG7GA7GGCC 7GGCCCGCAA 
ACACGGCGAC T7C7CGGACA C77GCG7ACC 
AGIAA7ACGA CAGAGCGA7A CGACACAGC7 
AG7AC7AC7A A77GA77ACC 7AC7ACC7AG 
7G7G7G7G7G 7A7G7A7ATG CC77ACC7TA 
7CC7GGC7AC C7ACC7ACC7 C7ACC77G7A 
7AGG7AC7GG A7A7ACAGG7 CC7GAACA7G 
CCC7TCCGCA GGTA7T7A7G 7AGCCCACAG 
TGCAG7C7AC C7ACAAAGCC AGCAG777CA 
TCCC7CCCA7 AA7ACCAAT7 GGCG7TCAAC 
CG7CGAGGCC A7GGCAAAGT CCA7G7CCGA 
C7CGACCAAG CATGACTTTG 7CCAAGCC7C 
GC7CG7CACC CAGA7GGGC7 TCCGCGAGAC 
CGCG7GCGGA GCGGGCG7GC TGACGCAGGA 
GGAGAGGAGC 7CG77TACG7 G7GCGGACAA 
GAGGA77GAT GAGGAGAAG7 GGG7GAATGC 
7ATA7ACA7A 7A7A7C7A7A TC7A7A7AGA 
ATG7CC7ACG GCTGCTGAFT GAT7GA77GA 
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1501 7G7GG7GA7G CTGATGTCCC AGAACACGGG GCTCCCAGftC AAC7CC77CA CCCA7G7GGG 
1561 CA7TGCCCTG GCACTGCACA TCATCCCCGA TCCAGA7GCC G7CCTCAAAG G7AAACAA7C 
1621 ACCAGCGTCA C7GCAAAGAG AGATTACGGG A7A7CA7A7A CTGAAACCAA AGCCCAGACT 
1681 GCATCAGAAT GCTCAAGCCA GGCGGCATCT TTGGCGCATC GACATGGCCC AAGGCCAGCG 
1741 CCGACATGTT CTGGATCGCC GACA7GCGCA CCGCCCTGCA GTCGCTCCCC TTTGACGCGC 
1801 CGCTGCCAGA CC03TTCCCC AT6CAGCTGC ACACC7CGGG CCAC76GGAC GACGCCGCCT 
1861 GGGTCGAGAA GCATCTCGTC GAGGATCTGG GGC7GGCCAA CGTCTGTGTG AGGGAGCCGG 
1921 CGGGCGAG7A CAGC7TTGCG AGCGCGGACG AGTTCATGGC GACGTTTCAG ATGATGCTGC 
1981 CGTGGATTAr GAAGACG77T TGGAGCGAGG AGGTGAGGGA GAAGCATTCG GTCGACGAGG 
2041 7CAAGGAGT7 GGTGAAGAGG CATC7GGAGG ACAAGTATGG GGGGAAGGGA TGGACCATTA~ 
2101 A&TGGCGGGT GATTACCATG ACTGCGACTG CGAGDAAGTG AGGGAGGGCA TCTGCTCATG 
2161 ATTATGTGAC AGCGAGCCAG 7AGAGAGCCA TATTG7TG7C 77CAGAA7GT GAGGACtGTG 
2221 ArGGlTGGTG T77G7TGGAG 7GA7AAC7CG 7GGG7GT7GC 7A77TGCA7G 7GAGACGA7G 
2281 AACCA7GDGC ACCAGCCACA A7CAC7G7CC CCCADC77AC C7ACCAAC7T CAAG77ACCA 
2341 CG7TACC777 ACCTGA7C7A GCAC7GTG6C GCAGC77GG7 T7GAC7GCTA GG7ACC7ACC 
2401 7AG7AGTAA7 CAGG7ACA77 C77CA7CCC7 G7G7CC7GG7 G7CGCAG77G CAGC77G7C7 
2461 TA7CGCTGTG GCCACGCA7C GAGTGGCAGC A7C77CAAC7 7CAAG7CCCG 7CGG7CGCAC 
2521 7C7GGCCACG 7CGCAGA7GG A7CGCAGCGG GA7C7GAACC GC7CGC7CGG CAAC7GA7AC 
2581 CAAG7CAACA AACACACGAG ACGACGGGAC GC7GA7A7AA MGAGGAG GG7AAGAGAA 
2641 C7C7ACGAGG GGCGGAAAC7 7GG7CCGACA An7CCC7CC CA7C77CACC C7CGAC7CGA 
2701 AC7CGAAC7C GA7AGCCGCA CCC7CGACCG AT7GCCC 
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COTCCTATC 77AG7CC77C 77GT7G7CCC AAAAJGGCGC CCTCAG7TAC AC7GCCG77G 
ADCACGGCCA 7CCTGGCCA7 7GCCCGDCTC GTCGCCGCCC AGCAACCGGG TACCAGCACC 
CCCGAGG7CC ATCCCAAG7T GACAACC7AC AAGIG7ACAA AG7CCGGGGG GIGCGTGGDC 
CAGGACACCT CGGTGGTCC7 7GAC7GGAAC 7ACCGC7GGA 7GCACGACGC AAAC7ACAAC 
TCGTGCACCG TCAACGGCGG CG7CAACACC ACGC7C7GCC C7GACGAGGC GACCTGTGGC 
AAGAACTGCT 7CA7CGAGGG CGICGACTAC GCCGCCTCGG GCGTCACGAC C7CGGGCAGC 
AGCC7CACCA TGAACCAGTA CATGCtXAGC AGC7C7GGCG GC7ACAGCAG CGTCTC7CCT 
CGGCTGTATC TCCTGGACTC TGACGGTGAG 7ACG7GA7GC 7GAAGC7CAA CGGCCAGGAG 
CTGAGCnCG ACGTCGACCT CTCTGCTCTG CCGTGTGGAG AGAACGGC7C GC7C7ACC7G 
TCTCAGATGG ACGAGAACGG GGGCGCCAAC CAGTA7AACA CGGCCGGTGC CAAC7ACGGG 
AGCGGCTACT GCGA7GCTCA GTGCCCCGTC CAGACA7GGA GGAACGGCAC CCTCAACAC7 
AGCCACCAGG GCT7C7GC7G CAACGAGA7G GA7ATCC7GG AGGGCAACTC GAGGGCGAA7 
GCCTTGACCC CTCAC7CT7G CAC6GCCACG GCC7GCGAC7 CTGCCGGTTG CGGCTTCAAC 
CCCTATGGCA GCGGCTACAA AAGC7ACTAC GGCCCCGGAG ATACCG77GA CACC7CCAAG 
ACCHCACCA TCATCACCCA G7TCAACACG GACAACGGCT CGCCCTCGGG CAACCTTGTG 
AGCATCACCC GCAAGTACCA GCAAAACGGC GTCGACATCC CCAGCGCCCA GCCCGGCGGC 
GftCACCATCT CGTCCTGCCC GTCCGCCTCA GCC7ACGGCG GCCTCGCCAC CATGGGCAAG 
GCCCrGAGCA GCGGCATGGT GCTCGTGTTC AGCAT77GGA ACGACAACAG CCAGTACATG 
AACTGGCTCG ACAGCGGCAA CGCCGGCCCC TGCAGCAGCA CCGAGGGCAA CCCATCCAAC 
ATCCTGGCCA ACAACCCCAA CACGCACGTC GTCTTCTCCA ACATCCGCTG GGGAGACATT 
GGGTCTACTA CGAACTCGAC TGCGCCCCCG CCCCCGCCTG CGTCCAGCAC GACGTTTTCG 
ACTACACGGA GGAGCTCGAC GACTTCGAGC AGCCCGAGCT GCACGCAGAC TCACTGGGGG 
CAGTGCGGTG GCATTGGGTA CAGCGGGTGC AAGACGTGCA CGTCGGGDAC TACGTGCCAG 
TATAGCAACG ACTACTACTC GCAATGCCTT 7AGAGC6TTG ACTTGCCTCT GGTC7GTCCA 
GACGGGGGCA CGA7AGAA7G CGGGCACGCA GGGAGCTCGT AGACA77GGG CT7AATA7A7 
AAGACATGC7 ATG1TGTATC TACAT7AGCA AATGACAAAC AAATGAAAAA GAAC77A7CA 
AGCAAAAAAA AAAAAAAAAA AAAAAAAA 
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GGACCTACCC AGTCTCACTA 
TGCGCCAGCG GCACAACTTG 
TCCGTGCGAA AGCCTGACGC 
GGAGCTACAT GGCCCCGGGT 
A7ACGGTCAA CTCATCTTTC 
TTGGCAAATT GTGGC1TTCG 
TAACGGAATA GAAGAAAGAG 
CCCGTAGAAT CGCCGCTCTT 
CAATGTTGAT ATTGTTCCGC 
CGAACGCGGT AGTGGCTGCT 
GCAACAGTGG AAATTAGTGG 
TGGAAAGCAC TGTTGGAGAC 
ACGATGACAA CGTAGCCGAG 



CGGCCAGTGC GGCGGTATTG 
CCAGGTCCTG AACCCTTACT 
ACCGGTAGAT TCTTGGTGAG 
GATnATTTT TTTTGTATCT 
ACTGGAGATG CGGtXTGCTT 
AAAACACAAA ACGATTCCTT 
GMATTAAAA AAAAAAAAAA 
CGTGTATCCC AGTACCACGT 
CAGTATGGCT CCACCCCCAT 
GCCAATTGGT AATGACCATA 
CGCAATAATT GAGAACACAG 
CAACTTGTCC GTTGCGAGGC 
GACCC 

FIG.7B 



GCTACAGCGG CCCCACGGTC 
ACTCTCAGTG CCTGTAAAGC 
CCCGTATCAT GACGGCGGCG 
ACTTCTGACC CTTTTCAAAT 
GGTATTGCGA TGTTGTCAGC 
AGTAGCCATG CATTTTAAGA 
AACAAACATC CCGTTCATAA 
CAAAGGTA7T CATGATCGTT 
CTDCGCGAAT CTCCTCTTCT 
GGGAGACAAA CAGCATAATA 
TGAGACCATA GCTGGCGGCC 
CAACTTGCAT TGCTGTCAAG 
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Thr Thr STOP Ser Thr 
■ • . ACT ACG TAG TCG ACT . . . 
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AGGCATGTTG TGAATCTGTG 


TCGGGCAGGA 


CACGCCTCGA 


AGGTTCACGG CAAGGGAAAC 


1320 


CACCGATAGC AGTGTCTAGT 


AGCAACCTGT 
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2100 


GCCTCCCTCA TGCTCTCCCC 


ATCTACTCAT 


CAACTCAGAT 


CCTCCAGGAG ACTTGTACAC 


2160 


CATCTTTTGA GGCACAGAAA 


CCCAATAGTC 


AAtXGCGGjftC 


TGCGCATdATGl 
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GGCGGTATTG 


GCTACAGCGG CCCCACGGTC TGCGCCAGCG 


GCACAAC77G CCAGGTCC7G 
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AACCCTTACT 


ACTCTCAGTG CCTGTAAAGC TCCGTGCGAA 


AGCC7GACGC ACCGGTAGA7 


120 


7CTTGGIGAG 


CCCGTATCAT GACGGCGGCG GGAGCTACAT 


GGCCCCGGG7 GA7T7ATT77 


180 


TTTTGTATCT 


AC77C7GACC CTTTTCAAAT A7ACGG7CAA 


C7CA7C7nC AC7GGAGA7G 


240 


CGGCCTGCTT 


GG7AT7GCGA 7G77G7CAGC TTGGCAAATT 


G7GGC77TCG AAAACACAAA 


300 


ACGATTCC7T 


NSi J JJQTiHI 

AGTAGCCATG CATCGGGATC CTTTAAGATA 


AC&GAA7AGA AGAAAGAGGA 


360 


AATTAAAAAA 


AAAAAAAAAA CAAACATCCC GTTCATAACC 


CGTAGAATCG CCGC7CTFCG 


420 


TGTATCCCAG 


TACCACGGCA AAGG7A77TC ATGATCGTTC 


AATG7TGATA 77G77DCCGC 


460 


CAGTATGGCT 


GCACCCCCAT CTCCGCGAAT CTCCTC7TCT 


CGAACGCGG7 AG7GGCGCGC 


540 


CAATTGGTAA 


7GACCATAGG GAGACAAACA GCA7AATAGC 


AACAG7GGAA A77AGTGGCG 


600 


CAATAATTGA 


GAACACAGTG AGACCATAGC 7GGCGGCC7G 


GAAAGCAC7G 77GGAGACCA 


660 


ACTTGTCCGT 


TGCGAGGCCA AC77GCA7TG C7G7CAAGAC 


GA7GACAACG TAGCCGAGGA 


720 


CCGTCACAAG 


GGACGCAAAG 7TG7CGCGGA 7GAGGTC7CC 


G7AGA7GGCA 7AGCCGGCAA 
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TCCGAGAGTA 


GCCTCTCAAC AGGTGGCC7T 77CGAAACCG 


G7AAACC77G 77CAGACG7C 
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GGATTTGACT 


GGACAAAATC 7TCCAGTA7T CCCAGG7CAC 
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96 D 


TCTCGCGTGC 


AN7CGAAAGT CGC7A7AG7G CGCAA7GAGA 


GCACAG7AGG AGAATAGGAA 


1020 


CCCGCGAGCA 


CA7T67TCAA 7C7CCACA7G AATTGGA7GA 


CTGCTGGGCA GAA7GTGC7G 


1080 


CCTCCAAAAT 


CC7GCG7CCA ACAGA7AC7C TGGCAGGGGC 


77CAGA7GAA TGCC7CTGGG 


1140 


CCCCCAGATA 


AGATGCAGC7 CTGGAT7C7C GGTTACNA7G 


A7A7CGCGAG AGAGCACGAG 


120D 


TTGGTGATGG 


AGGGACAGGA GGCATAGG7C GCGCAGGCCC 


ATAACCAG7C 77GCACAGCA 


1260 


TTGATCTTAC 


C7CACGAGGA GC7CC7GA7G CAGAAAC7CC 


TCCATGTT6C 7GA77GGG77 


1320 
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GAGAA7T7CA TCGC7CC7GG A7CGTATGGT 7GC7GGCAAG ACCCTGCTTA ADCG7GCCG7 1380 

GTCATGGTCA TCTCTGGTGG CTTCGTCGCT GGCCTGTCTT TGCAATTCGA CAGCAAATGG 1440 

TGGAGATCTC TCTATCGTGA CAGTCATGGT AGCGATAGCT AGGTG7CGTT GCACGCACAT 1500 

AGGCCGAAAT GCGAAGTGGA AAGAATTTCC CGGNTGCGGA ATGAAG7CTC GTCATTTTGT 1560 

BortHI 

ACTCGTACTC GACACCTCCA CCGAAGTGTT AATAAT GGAT CC ACGATGCC AAAAAGCTTG 1620 

sphr 

7GCATGC 1627 
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10 20 3q 40 50 60 

1 GAATTCTCAC GGTGAATGtA GGCCTTTTGT AGGGTAGGAA TTGTCACTCA AGCACCCCCA 

61 ACCTCCATTA CGCCTCCCCC ATAGAGTTCC CAATCAGTGA GTCATGGCAC TGTTCTCAAA 

121 TAGATTGGGG AGAAGTTGAC TTCCGCCCAG AGCTGAAGGT CGCACAACCG CATGATATAG 

181 GGTCGGCAAC GGCAAAAAAG CACGTGGCTC ACCGAAAAGC AAGATGTTTG CGATCTAACA 

241 TCCAGGAACC TGGATACATC CATCATCACG CACGACCACT TTGATCTGCT GGTAAACTCG 

301 TATTCGCCCT AAACCGAAGT GCGTGGTAAA TCTACACGTG GGCCCCTTTC GGTATACTGC 

361 GTGTGTCTTC TCTAGGTGCA TTCTTTCCTT CCTCTAGTGT TGAATTGTTT GTGTTGGGAG 

421 TCCGAGCTGT AACTACCTCT GAATCTCTGG AGAATGGTGG ACTAACGACT ACCGTGCACC 

481 TGCATCATGT ATATAATAGT GATCCTGAGA AGGGGGGTTT GGAGCAATGT GGGACTTTGA 

541 TGGTCATCAA ACAAAGAACG AAGACGCCTC TTTTGCAAAG TTTTGTTTCG GCTACGGTGA 

601 AGAACTGGAT ACTTGTTGTG TCTTCTGTGT ATTTTTGTGG CAACAAGAGG CCAGAGACAA 

661 TCTATTCAAA CACCAAGCTT GCTCTTTTGA GCTACAAGAA CCTGTGGGGT ATATA TCTA G 

721 TGGCCAGAAT GCCTAGGTCA CCTCTAGAp A GTTGAAACTG CCTAAGATCT CGGGCCCTCG 

781 GGCTTCGGCT TTGGGTGTAC ATGTTTGTGC TCCGGGCAAA TGCAAAGTGT GGTAGGATCG 

841 ACACACTGCT GCCTTTACCA AGCAGCTGAG GGTATGTGAT AGGC AAA TGT TCAGGGGCCA 

901 CTGCATGGTT TCGAATAGAA AGAGAAGCTT AGCCAAGAAC AATAGCCGAT AAAGATAGCC 

961 TCATTAAACG AAATGAGCTA GTAGGCAAAG TCAGCGAATG TGTATATATA AAGGTTCGAG 

1021 GTCCGTGCCT CCCTCATGCT CTCCCCATCT ACTCATCAAC TCAGATCCTC CAGGAGACTT 



1081 GTACACCATC TTTTGAGGCA CAGAAACCCA ATAGTCAACC GCGGACTGCG CATC ft TG 
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- RESTRICTION SITES MARKED WITH"*" ARE NOT SINGLE SITES 

- TWO ADDmONAL EcoRI -SITES IN THE cbh1-GENE 
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ECGCG&CTG CGCATCftffl 


1740 


A7CGGAAGTT GGCCG7CATC TCGGCCTTCT TGGCCACAGC TCGTGCTCAG 7CGGCC7GCA 


1800 


CTCTCCAATC GGAGACTCAC CCGCCTCTGA CATGGCAGM ATGCTCGTCT GGTGGCACTT 


1860 


GCACTCAACA GACAGGCTCC GTGGTCATCG ACGCCAAC7G GCGCTGGACT CACGCTACGA 


1920 


ACAGCAGCAC GAACTGCTAC GA7GGCAACA CTTGGAGC7C GACCCTATG7 CC7GACAACG 


1980 


AGACCTGCGC GAAGAACTGC 7G7CTGGACG G7GCCGCC7A CGCG7CCACG TACGGAGT7A 


2Q4Q 


CCACGAGCGG TAACAGCCTC TCCA77GGCT TTG7CACCCA G7CTGCGCAG AAGAACGT7G 


2100 


GCGC7CGCC7 7TACCTTA7G GGCAGCGACA CGACC7ACCA GGAA77CACC C7GCT7GGCA 


2160 


ACGAGT7CTC T7TCGATG7T GA7G7T7CGC AGC7GCCG7A AG7GAC7TAC CA7GAACCCC 


£220 


TGACGfATCT TCTTGTGGGC TCCCAGCTGA CTGGCCAAT7 TAAGG7GCGG CTTGAACGGA 


2280 


GC7C7CTAC7 7CCTG7CCA7 GGACGCGGA7 GG7GGCG7GA GCAAG7A7CC CACCAACACC 


2340 


GCTGGCGCCA AG7ACGGCAC GGGGTAC7G7 GACAGCCAGT G7CCCCGCGA TCTGAAGT7C 


2400 


ATCAATGGCC AGGCCAACGT TGAGGGC7GG GAGCCGTCAT CCAACAACGC AAACACGGGC 


2460 


AT7GGAGGAC ACGGAAGC7G C7GC7C7GAG A7GGA7ATC7 GGGAGGCCAA CTCCATCTCC 


2520 


GAGGC7C7TA CCCCCCACCC 77GCACGAC7 G7CGGCCAGG AGA7CTGCGA GGG7GA7GGG 


2580 


TCCGGCGGAA CTTACTCCGA TAACAGATAT GGCGGCACT7 GCGA7CCCGA TGGCTGCGAC 


2640 


TGGAACCCA7 ACCGCCTGGG CAACACCAGC T7C7ACGGH C7GGC7CAAG CTT7ACCC7C 


2700 


GATACCACCA AGAAAT7GAC CGTTG7CACC CAG7CCGAGA CGTCGGGTGC CATCAACCGA 


2760 


TACTATGTCC AGAATGGCGT CACTT7CCAG CAGCCCAACG CCGAGCTTGG TAGTTAC7C7 


2820 


GGCAACGAGC TCAACGA7GA 77AC7GCACA GC7GAGGAGG CAGAAT7CGG CGGATCC7C7 


2880 


nCTCAGACA AGGGCGGCC7 GACTCAGT7C AAGAAGGC7A CCTC7GGCGG CA7GG77C7G 


2940 


GTCATGAGTC fGfGGGATGA TGTGAGTT7G ATGGACAAAC ATGCGCGTTG ACAAAGAG7C 


3000 
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AAHPAGPTKA TTrAHATrTT APACTAPTAP rPPAArATrP TrnTXPTHflA PTPPAPPTAP 

AAuUAUUUA LIuAUAIull ALAblALIAL uLlAALAIuL luluuUuuA ULIAUUAL 


jUdu 


rpnAPAAAPr ArAPPTPTTr pAPAPrpnrrr rppirrnrnrn itAA^PTrPTP pappatptpp 


01311 


ftrTCTPPPTr PTPAnrTPrA ATPTPArTPT PPPAAPflPPA ArfTTPAPPTT PTPPAAPATP 

ublulLlUU LILAuulluA AILILAufU LUlAALulLA Abb It ALL II LIUIAALAJL 


of on 
J1HU 


AAGTTCGGAC CCATTGGCAG CACCGGCAAC CCTAGCGGCG GCAACCCTCC CGGCGGAAAC 


3240 


CCGCCTGGCA CCACCACCAC CCGCCGCCCA GCCAC7ACCA C7GGAAGCTC TCCCGGACCT 


3300 


ACCCAGTCTC ACTACGGCCA GTGCGGCGGT ATTGGCTACA GCGGCCCCAC GGTCTGCGCC 


3360 


AGCGGCACAA CTTGCCAGGT CCTGAACCCT TACTACTCTC AGTGCCTGTA AAGCTCCGTG 


3420 


CGAAAGCCTG ACGCACCGGT AGATTCTTGG TGAGCCCGTA TCATGACG5C GGDGGGAGCT 


3480 


ACATGGCCCC GGBTGATTTA TTTTTTTTGT ATCTACTTCT GACCCTTTTC AAATATACGG 


3540 
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1381 
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1441 


GC7AAAAG7A 


CATAAGTTAA 
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156L AAGCCCACT7 ACCCACGTTT GTTTCTTCAC 7CAG7CCAAT CTCAGCTGG7 GA7CCCCCAA 
1621 TTGGGTCGCT TGTTTGTTCC GGTGAAGTGA AAGAAGACAG AGG7AAGAA7 G7CTGAC7CG 
1681 GAGCGTTTTG CATACAACCA AGGGCAGTGA TGGAAGACAG TGAAATG7TG ACATTCAAGG 
1741 AGTA7T7AGC CAGGGATGCT TGAGTGTA7C G7GTAAGGAG GT7TGTCTGC CGATACGACG 
1801 AATACTGTAT AGTCACTTCT GATGAAGTGG TCCATA7TGA AATGTAAG7C GGCAC7GAAC 
1861 AGGCAAAAGA 7TGAG77GAA ACTGCC7AAG ATCTCGGGCC CTCGGGC77C GGCTTTGG6T 
1921 GTACATGTTT GTGCTCCGGG CAAATGCAAA G7G7GG7AGG A7CGACACAC TGCTGCCTTT 
1981 ACCAAGCAGC TGAGGG7ATG 7GA7AGGCAA ATG7TCAGGG GCCAC7GCAT GG7T7CGAAT 
2041 AGAAAGAGAA GCTTAGCCAA GAACAATAGC CGATAAAGA7 AGCC7CA7TA AACGAAA7GA 
2101 GCTAGTAGGC AAAGTCAGCG AATGTG7A7A TA7AAAGGTT CGAGGTCCG7 GCCTCCC7CA 
2161 TGCTCTCCCC A7CTAC7CA7 CAAC7CAGA7 CCTCCAGGAG AC7TGTACAC CA7C7TT7GA 
2221 GGCACAGAAA CCCAATAGTCAACCGCGGAC TGCGCATCgOl 

FIG.18A(Cont.) 



SUBSTITUTE SHEET 



II. 1 



WO 94/04673 PCT/FI93/0O33O 

42/47 

10 BO 30 40 50 60 

I I I I I I 

1 CAATTCTCAC GGTGAATGTA GGCCTTTTGT AGGG7AGGAA TTGTCACTCA AGCACCCCCA 

61 ACCTCCATTA CGCCTCCCCC ATAGAGTTCC CAATCAGTGA G7CA7GGCAC TGTTCTCAAA 

121 TAGATTGGGG AGAAGT7GAC nCCGCCCAG AGCTGAAGG7 CGCACAACCG CATGATATAG 

181 GGTCGGCAAC GGCAAAAAAG CACGTGGCTC ACCGAAAAGC AAGATGTTTG CGATCTAACA 

241 7CCAGGAACC TGGATACATC CATCATCACG CACGACCACT TTGATCTGCT GGTAAACTCG 

301 TATTCGCCCT AAACCGAAGT GCGTGGTAAA TCTACACGTG GGCCCC7T7C GGTATACTGC 

361 GTGTGTCTTC TCTAGGTGCA T7CTTTCCTT CCTCTAGTGT TGAATTGTTT GTGTTGGGAG 

421 7CCGAGCTG7 AACTACCTCT GAATCTCTGG AGAATGGTGG AC7AACGAC7 ACCGTGCACC 

481 TGCATCATG7 ATATAATAGT GATCCTGAGA AGGGGGGTTT GGAGCAATGT GGGACTTTGA 

541 TGGTCATCAA ACAAAGAACG AAGACGCCTC TTTTGCAAAG TTTTGTTTCG GCTACGGTGA 

601 AGAACTGGAT ACTTGTTGTG TCTTCTGTGT ATTTTTGTGG CAACAAGAGG CCAGAGACAA 

661 TCTATTCAAA CACCAAGCTT GCTCTTTTGA GCTACAAGAA CCT ffETSSB T ATATATCTAG 

+ 

721 TGGCCAGAAT GCCTAGGTCA CCTCTAAA TG TGTAATTTGC CTGCTTGACC GATCTAAACT 

781 GTTCGAAGCC CGAATG7AGG ATTGTTATCC GAACTCTGCT CGTAGAGGCA TGTTG7GAAT 

841 CTGTGTCGGG CAGGACACGC CTCGAAGGT7 CACGGCAAGG GAAACCACCG ATAGCAGTGT 

901 CTAGTAGCAA CCTGTAAAGC CGCAATGCAG CATCACTGGA AAATACAAAC CAATGGCTAA 

961 AAGTACATAA GTfAATGCCT AAAGAAGTCA TATACCAGCG GCTAATAATT GTACAA7CAA 

1021 GTGGCTAAAC GTACCGTAAT TTGCCAACGC GT TlTETAGS f TGCAGAAGCA CGGCAAAGCC 

1081 CACTTACCCA CGTTTGTTTC 77CAC7CAG7 CCAA7C7CAG C7GGTGA7CC CCCAA77GGG 

1141 7CGCnG777 GT7CCGG7GA AG7GAAAGAA GACAGAGG7A AGAA7G7C7G AC7CGGAGCG 

1201 7T7TGCA7AC AACCAAGGGC AG7GA7GGAA GACAG7GAAA TGT7GACATT CAAGGAG7A7 

1261 77AGCCAGGG ATGC77GAG7 G7A7CG7G7A AGGAGG77TG 7C7GCCGA7A CGACGAA7AC 
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1321 TGTATAGTCA CTTCTGATGA AGTGGTCCAT ATTGAAATGT AAGTCGGCAC TGAACAGGCA 

1381 AAAGATTGAG TTGAAACTGC C7AAGATC1C GGGCCC7CGG GCTTCGGCT7 TGGGTGTACA 

144L TGTTTGTGCT CCGGGCAAAT GCAAAGTG7G GTAGGATCGA CACACTGCTG CCTTTACCAA 

150L GCAGCTGAGG G7ATGTGATA GGDAAATGTT CAGGGGCCAC 7GCA7GG77T CGAATAGAAA 

1561 GAGAAGCTTA GCCAAGAACA ATAGCCGATA AAGATAGCCT CATTAAACGA AATGAGCTAG 

1621 TAGGCAAAGT CAGCGAATGT GTATATATAA AGGTTCGAGG 7CCGTGCCTC CCTCATGCTC 

1681 TCCCCATCTA C7CATCAACT CAGATCCTCC AGGAGACTTG TACACCATCT 7TTGAGGCAC 

1741 AGAAACCCAA TAGTCAACCG CGGACTGCGC ATtfiEl 
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AACTACC7C7 GAATC7CTGG 


AGAA7GGTGG 


AC7AACGAC7 


AtXGTGCACC 


481 


TGCATCATGT 


ATATAATAGT GATCCTGAGA 


AGGGGGGTTT 


GGAGCAA7G7 


GGGAC7T7GA 


541 


7GG7CA7CAA 


ACAAAGAACG AAGACGCCTC 


7777GCAAAG 


7777G777CG 


GC7ACGG7GA 


601 


AGAACTGGAT 


ACTTGT7GTG TCTTCTGTG7 


ATTTTTGTGG 


CAACAAGAGG 


CCAGAGACAA 


661 


TCTATTCAAA 


CACCAAGCTT GCTCTTT7GA 


GCTACAAGAA 


CCTfTCTAA^fT 


A7A7A7C7AG 


721 


TGGCCAGAAT 


GCCTAGGTCA CCTCTAAATG 


TGTAATTTGC 


C7GC77GACC 




781 


GTTCGAAGDC 


CGAATGTAGG ATTGTTATCC 


GAACTCTGCT 


CG7AGAGGCA 


TGTTGTGAAT 


841 


CTGTGTCGGG 


CAGGACACGC CTCGAAGGTT 


CACGGCAAGG 


GAAACCACCG 


A7AGCAG7G7 


901 


C7AG7AGCAA 


CCTGTAAAGC CGCAATGCAG 


CATCACTGGA 


AAA7ACAAAC 


CAA7GGC7AA 


961 


AA67ACATAA 


GTTAATGCCT AAAGAAGTCA 


TATACCAGCG 


GC7AA7AA77 


GTACAA7CAA 


1021 


uTGGCfAAAC 


GTACCGTAAT TTGCCAACGC 


GTTfTCTAGAfr 


7GCAGAAGCA 


CGGCAAAGCC 


1081 


CACnACCCA 


CG777G777C T7CACTCAGT 


CCAA7C7CAG 


C7GG7GA7CC 


CCCAA77GGG 


114] 


TCGCTrGTTT 


G1TCCGGTGA AGTGAAAGAA 


GACAGAGGTA 


AGAATG7C7G 


ACTCGGAGCG 


1201 


TTTTGCATAC 


AACCAAGGGC AGTGATGGAA 


GACAGTGAAA 


TG7TGACA77 


CAAGGAG7A7 


1261 


TTAGCCAGGG 


ATGCTTGAGT GTA7CG7GTA 


AGGAGGTTTG 


7C7GCCGA7A 


CGACGAA7AC 
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1321 TGTATAGTCA CTTCTGATGA AGTGGTCCAT ATTGAAATGT AAGTCGGCAC TGAACAGGCA 

13B1 AAAGATTGAG TTGAAACTGC CTAAGATCTC GGGCCCTCGG GCTTCGGCTT TGGGTGTACA 

1441 rGTTTGTGCT CCGGGCAAAT GCAAAGTGTG GTAGGATCGA CACACTGCTG CCTTTACCAA 

1501 GCAGCTGAGG GTATGTGATA GGCAAATGTT CAGGGGCCAC TCCATGGTTT CGAATAGAAA 

1561 GAGAAGCTTA GCCAAGAACA ATAGDCGATA AAGATAGCCT CATTAAACGA AATGAGCTAG 

1621 TAGGCAAAGT CAGCGAATGT GTATAWAA AGGTTCGAGG TCCGTGCCTC CCTCATGCTC 

1681 TCCCCATCTA CTCATCAACT CAGATCCTCC AGGAGACTTG TACACCATCT TTTGAG6CAC 

1741 AGAAACCCAA TAGTCAACCG CGGACTGCGC ATtiSTGl 

FIG.18C(Cont.) 
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